[comp.lang.c] another auto-increment problem

rion@wdl1.UUCP (Rion Cassidy) (09/25/87)

Preface:
	This is my first time posting to comp.lang.c; please excuse me
	if this problem has been brought up and beaten to death before.



While attempting to use the auto-increment operation to improve efficiency
of array accesses, I discovered what is an implementation detail (and
therefore not specified in the language) that most people would not
expect.

Consider the following definitions:

	float junk[3], *junk_ptr;
		/* junk contains x,y,z coords */

and use thereof:

	junk_ptr = (float *) junk;
	xfpt(*junk_ptr++, *junk_ptr++, *junk_ptr);
		/* transform x,y,z coords to a coord new system */

At first glance, most people would assume that the following function
call is equivalent, but slightly slower:

	xfpt(junk[0], junk[1], junk[2]);

This assumption is based on the premise that the arguments to xfpt()
are evaluated left to right; if their are *NOT* we can be sure that we
will get the wrong values passed to xfpt.

Perhaps it is because we are so used to expressions getting evaluated
left to right that we just assume function arguments will as well (and
why on Earth *WOULDN'T* they implement it that way?).  As it turns
out, everyone I explained the problem to made the same assumption I
had (args evaluated left to right), but all our UNIX systems have
implementations that evaluate function arguments right to left!!

I presume this has something to do with the order in which the
arguments have to be pushed and popped on the stack, but I can't see
why it would really make a difference.

Are their any C-compiler implementors out there that would care to
comment? 

Rion Cassidy
<rion@wdl1>
{ucbvax|sgi|sun}!wdl1!rion

archer@elysium.SGI.COM (Archer Sully) (09/26/87)

In article <3680001@wdl1.UUCP>, rion@wdl1.UUCP (Rion Cassidy) writes:
> 
> Consider the following definitions:
> 
> 	float junk[3], *junk_ptr;
> 		/* junk contains x,y,z coords */
> 
> and use thereof:
> 
> 	junk_ptr = (float *) junk;
> 	xfpt(*junk_ptr++, *junk_ptr++, *junk_ptr);
> 		/* transform x,y,z coords to a coord new system */
> 
> At first glance, most people would assume that the following function
> call is equivalent, but slightly slower:
> 
> 	xfpt(junk[0], junk[1], junk[2]);
> 
> This assumption is based on the premise that the arguments to xfpt()
> are evaluated left to right; if their are *NOT* we can be sure that we
> will get the wrong values passed to xfpt.
> 
...(stuff deleted about order of evaluation)

> Are their any C-compiler implementors out there that would care to
> comment? 
> 
> Rion Cassidy
> <rion@wdl1>
> {ucbvax|sgi|sun}!wdl1!rion

I'm not a C-compiler implementor, but I think I'll comment anyway.

The use of increment and decrement operators in function calls 
is always considered non-portable, precisely because order of
evaluation isn't defined in C (cf K&R pp 212).

the following program is roughly equivalent to your fragment:

main()
{
	float j[3], *jp;
	void foo();

	jp = (float *) j;
	foo(*jp++, *jp++, *jp);
	return(0);
}

void 
foo(x,y,z)
float x,y,z;
{
}

and here's what lint had to say about it:

t.c:

t.c(7): warning: jp evaluation order undefined
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
t.c(7): warning: jp evaluation order undefined
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
t.c(13): warning: argument x unused in function foo
t.c(13): warning: argument y unused in function foo
t.c(13): warning: argument z unused in function foo
foo, arg. 1 used inconsistently	t.c(14)  ::  t.c(7)
foo, arg. 2 used inconsistently	t.c(14)  ::  t.c(7)
foo, arg. 3 used inconsistently	t.c(14)  ::  t.c(7)

As for reasons why arguments to functions are evaluated right to left,
here's a pretty good one.  It may be more efficient to evaluate each 
argument just before it is pushed, rather than evaluating
the argument list and then pushing.  Remember that most C-compilers
push arguments right to left rather than left to right, for various
and sundry reasons.  If you find one that pushes arguments right
to left, it probably evaluates that way, too.

Archer Sully
archer@sgi.com
{ucbvax,sun,pyramid,ames}!sgi!archer

tim@amdcad.AMD.COM (Tim Olson) (09/26/87)

In article <3680001@wdl1.UUCP> rion@wdl1.UUCP (Rion Cassidy) writes:
+-----
| 	junk_ptr = (float *) junk;
| 	xfpt(*junk_ptr++, *junk_ptr++, *junk_ptr);
| 		/* transform x,y,z coords to a coord new system */
|
| Are their any C-compiler implementors out there that would care to
| comment? 
+-----
There are two problems with this statement.  First of all, as you found
out, the order of evaluation of procedure parameters is not stated; some
compilers will evaluate right-to-left, while others will evaluate
left-to-right (as you presumed, this usually has to do with how stack
frames are built to pass parameters).  However, there is another
problem.

The ANSI standard states that "The order of evaluation of the function
designator, the arguments, and subexpressions within the arguments is
unspecified, but there is a sequence point before the actual call." 
This means that the increment of junk_ptr does not have to take effect
until just before the call, so (legally) the expressions could evaluate to:

	xfpt(junk_ptr[0], junk_ptr[0], junk_ptr[0]);

or other non-obvious combinations.  The only safe way to perform this
function call is

	xfpt(junk_ptr[0], junk_ptr[1], junk_ptr[2]);
	

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)
	

johnl@ima.ISC.COM (John R. Levine) (09/27/87)

In article <3680001@wdl1.UUCP> rion@wdl1.UUCP (Rion Cassidy) writes:
> ...
>	float junk[3], *junk_ptr;
> ...
>	junk_ptr = (float *) junk;
>	xfpt(*junk_ptr++, *junk_ptr++, *junk_ptr);
> [isn't the same as]
>	xfpt(junk[0], junk[1], junk[2]);

Two points.  First, as has been pointed out elsewhere, no version of C has
ever made any promises about the order of evaluation of arguments nor of
side effects.  It would be entirely legitimate for the compiler in the
first case to push junk[0] three times and then increment the pointer
twice, and on many machines that'd be the best code.

Second, on computers that don't have auto-increment address modes, the
second version is probably faster and smaller than the first, not to
mention making your intentions clearer.  In any event, one or two extra
address calculation instructions are unlikely to make any perceptible
difference if the called routine is going to do a bunch of floating point
multiplication.  Make it right, then make it fast.
-- 
John R. Levine, IECC, PO Box 349, Cambridge MA 02238-0349, +1 617 492 3869
{ ihnp4 | decvax | cbosgd | harvard | yale }!ima!johnl, Levine@YALE.something
The Iran-Contra affair:  None of this would have happened if Ronald Reagan
were still alive.

rwhite@nusdhub.UUCP (Robert C. White Jr.) (09/28/87)

In article <6484@sgi.SGI.COM>, archer@elysium.SGI.COM (Archer Sully) writes:
> 
> As for reasons why arguments to functions are evaluated right to left,
> here's a pretty good one.  It may be more efficient to evaluate each 
> argument just before it is pushed, rather than evaluating
> the argument list and then pushing.  Remember that most C-compilers
> push arguments right to left rather than left to right, for various
> and sundry reasons.  If you find one that pushes arguments right
> to left, it probably evaluates that way, too.


	There is a very good reason that the evaluation of arguments
to a function are evaluated in a right to left fassion. [on some
machines anyway, I don't know if it is a "universal" truth or not.]

	In any function which takes a variable number of arguments
of possibly variant size, it is best to have the argument which
defines the size and type of future arguments on the TOP of the 
stack.  If it were pushed first, it would be under N words of stack
where N is an undefined number between 0 and MAXSTACK-4(or more or less).
This is not as uncommon as it seems all of the [sf]printf and [sf]scanf
functions use the format string for just such a variant descriptor.

Robert ("So who asked you anyway?") White.

henry@utzoo.UUCP (Henry Spencer) (09/29/87)

> ... all our UNIX systems have
> implementations that evaluate function arguments right to left!!
> 
> I presume this has something to do with the order in which the
> arguments have to be pushed and popped on the stack...

As various people have pointed out, the order in which arguments are
evaluated is officially undefined, and you are warned not to depend on
it.  This got started mainly, I believe, because the earliest C compiler
used right-to-left evaluation.

The reason for right-to-left is in two parts.  First, functions like
printf are a lot easier to implement if arguments, once pushed, are in
ascending order in memory, e.g. argument 3 follows argument 2.  Second,
many modern machines have stacks that grow downward in memory, i.e. the
latest thing pushed is below the previous thing pushed; the PDP11 did it
that way for complex reasons, and many others have copied the 11.  Putting
these two considerations together, if the leftmost argument is to be at
the lowest location in memory, it must be pushed last.  Hence right-to-left.
-- 
"There's a lot more to do in space   |  Henry Spencer @ U of Toronto Zoology
than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry

karl@haddock.ISC.COM (Karl Heuer) (09/29/87)

In article <6484@sgi.SGI.COM> archer@elysium.SGI.COM (Archer Sully) writes:
>The use of increment and decrement operators in function calls 
>is always considered non-portable, precisely because order of
>evaluation isn't defined in C (cf K&R pp 212).

I don't really want to get involved in this thread, but your statement is
overly conservative.  A statement like "f(*jp++)" is perfectly portable; the
problem only occurs when you have *multiple* side effects in a single
expression.  (And not even then, if there is a sequence point between them.)

>Remember that most C-compilers push arguments right to left ...

At least on architectures where the stack grows downward.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

RMRichardson.PA@Xerox.COM (10/03/87)

From: Henry Spencer <henry@utzoo.uucp>
> The reason for right-to-left is in two parts.  First, functions 
> like printf are a lot easier to implement if arguments, once 
> pushed, are in ascending order in memory, e.g. argument 3 follows 
> argument 2.  Second, many modern machines have stacks that grow 
> downward in memory, i.e. the latest thing pushed is below the 
> previous thing pushed; the PDP11 did it that way for complex 
> reasons, and many others have copied the 11.  Putting these two 
> considerations together, if the leftmost argument is to be at the 
> lowest location in memory, it must be pushed last.  Hence 
> right-to-left.

Hmmmm, ...  If one were using a stack as a stack, and not as a "sort 
of managed" array, then I think Henry has the "right" conlusions for 
the "wrong" reasons; which way the items are addressed on the stack 
or whether the stack runs "up" or "down" in memory wouldn't matter.  

For a true stack, the items on the stack should be in order of use, 
first item on top, or to reverse the description of a stack, first 
used, last pushed.  Thus for printf, the first thing one wants is the
formatting description.  printf will pick this up and start forming 
the output stream of characters until it hits the first conversion 
specification, e.g., %d, at which point it needs the first value, 
i.e., the second argument, popped off the stack.  This continues 
until it runs off the end of the formatting string or runs out of 
arguments.  Thus for an architecture using a call stack, it pays to 
push the last (right most) argument first (First In Last Out).  In a 
"high level" language, the direction of the stack in real memory is 
invisible and irrelevant.  

Of course, there is a "work around" if the arguments are pushed on 
the stack first argument first; the function can implement its own 
stack and pop each item off the call stack and push it onto its local
stack, then run off its local stack.  Ugly, but possible.  

Next question: How many machines / compilers really use a stack this 
way for passing and processing arguments?  (As opposed to pushing the
arguments and "dragging" them out of something that looks more like 
an array.)

If the implementation is using the "stack" more like an array, then 
Henry's explanation really may make more sense; one wants the order 
of the arguments set up to ease the access to them, and one cares 
about the order in memory.  

Note that "older" architectures (e.g., CDC 3000 or 6000 series) were 
more likely to stuff the arguments into a temporary array (sometimes 
right after the call instruction) and pass a pointer to it (sometimes 
the return link itself).  Thus, the argument processing was usually 
left to right on these machines.  (Then again, the "high level" 
language was FORTRAN and recursive calls were not supported.)


Rich

ron@topaz.rutgers.edu (Ron Natalie) (10/04/87)

Even if you treat arguments as being pushed by the caller/popped
by the callee, you have an ambiguity.  Is the order of the arguments
that of the callers (I push the first argument first) or the callees
(the first argument is the one I pop off first).  Certainly variable
number of arguments is easier handled by the first, but they are
forbidden in ANSI C anyway.

The order is undefined not just because stack arguments may be placed
in either order, but because the call linkage is not defined by the
C language at all.  An implementor is free to place all the args in
an envelope and mail them to the subroutine as far as the C standard
is concerned.

-Ron

karl@haddock.ISC.COM (Karl Heuer) (10/05/87)

In article <9600@brl-adm.ARPA> RMRichardson.PA@Xerox.COM (Rich) writes:
>Hmmmm, ...  If one were using a stack as a stack, and not as a "sort
>of managed" array, then I think Henry has the "right" conlusions for
>the "wrong" reasons; which way the items are addressed on the stack
>or whether the stack runs "up" or "down" in memory wouldn't matter.
>
>For a true stack, the items on the stack should be in order of use, ...

Yes, but since most routines need random access to the arguments, a strict
stack implementation would be useless.  Every non-varargs routine would have
to begin with a prolog to pop the stacked arguments into randomly accessible
locations -- probably either machine registers or (for larger arglists) an
array.  Thus, most C implementations avoid this step by storing the args in an
array in the first place.  (Of course, this "array" is "the stack" on PDP-like
machines.)

>Of course, there is a "work around" if the arguments are pushed on the stack
>first argument first; the function can implement its own stack [and copy].

On machines where "the stack" grows upward, one normally saves the address of
the stack pointer before pushing the arguments.  This is equivalent to using a
queue rather than a stack as the argument-parsing structure.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

henry@utzoo.UUCP (Henry Spencer) (10/05/87)

> Next question: How many machines / compilers really use a stack this 
> way for passing and processing arguments?  (As opposed to pushing the
> arguments and "dragging" them out of something that looks more like 
> an array.)
> 
> If the implementation is using the "stack" more like an array, then 
> Henry's explanation really may make more sense; one wants the order 
> of the arguments set up to ease the access to them, and one cares 
> about the order in memory.  

This is in fact exactly the way most C implementations tackle the problem.
It generally is not practical to pop arguments as they are used, because
there are usually things like saved registers stacked on top of the
arguments.  A further point is that if one is implementing the code in
question (e.g. printf) in C, there is little choice since the hardware
stack primitives (if any) are not available from C.

Actually, Rich has a point in that stacking things left-to-right has
problems even on an upward-stack machine, because the arguments are in
the desired order but it may be difficult to locate the start of the
argument list.  Designing C stack frames is a non-trivial problem; anyone
thinking of seriously trying (or even of debating the matter from a position
of knowledge) should read the Johnson&Ritchie tech report from Bell Labs
first.  Those who merely want to use them need only remember that the
order of argument evaluation is thoroughly undefined.

(For reference:  "The C Language Calling Sequence", SC Johnson + DM Ritchie,
Computing Science Tech Report 102, Bell Labs, Sept 1981.)
-- 
PS/2: Yesterday's hardware today.    |  Henry Spencer @ U of Toronto Zoology
OS/2: Yesterday's software tomorrow. | {allegra,ihnp4,decvax,utai}!utzoo!henry

karl@haddock.ISC.COM (Karl Heuer) (10/07/87)

In article <15298@topaz.rutgers.edu> ron@topaz.rutgers.edu (Ron Natalie) writes:
>... Certainly variable number of arguments is easier handled by the first,
>but they are forbidden in ANSI C anyway.

Wrong.  Variadic functions are supported in ANSI C, but they must be declared
using prototypes (with the new ellipsis punctuator), and the only portable
way such a function can reference its later arguments is through <stdarg.h>.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

peter@sugar.UUCP (Peter da Silva) (10/08/87)

In article <15298@topaz.rutgers.edu>, ron@topaz.rutgers.edu (Ron Natalie) writes:
> Certainly variable
> number of arguments is easier handled by the first, but they are
> forbidden in ANSI C anyway.

Not in the draft I have. In fact the draft I have has syntax for handling
variable numbers of arguments: printf(char *, ...);

> The order is undefined not just because stack arguments may be placed
> in either order, but because the call linkage is not defined by the
> C language at all.

The call linkage is effectively defined by "what is efficient to perform
on the target macgine". Since this may vary, it's undefined in the language.
In practice, it's more efficient to evaluate and stack the arguments in
reverse order for conventional hardware architectures.

> An implementor is free to place all the args in
> an envelope and mail them to the subroutine as far as the C standard
> is concerned.

Well put. Personally I prefer "call by long distance" to "call by USPS" :->.
-- 
-- Peter da Silva  `-_-'  ...!hoptoad!academ!uhnix1!sugar!peter
-- Disclaimer: These U aren't mere opinions... these are *values*.