[comp.lang.c] Access efficiency

chris@mimsy.UUCP (Chris Torek) (07/04/87)
(Reading news when I should be sleeping . . . :)

In article <1213@carthage.swatsun.UUCP> rice@swatsun (Dan Rice) writes:
>... Suppose I have defined structures containing other structures, i.e., 
>
>typedef struct {
>	float x, y, z;	/* Coordinates of a vector in 3-space */
>} vector;
>
>typedef struct {
>	vector o;	/* Center */
>	float r;	/* Radius */
>} sphere;
>
>sphere s1, *s2;
>
>Now, say I refer to s1.o.x and s2->o.y.  Does the compiler convert this into
>a simple address reference at compile time, or is work performed at runtime?

Clearly this is compiler dependent.  A good compiler will do this
as efficently as possible on your machine.  That said, however,
there are a few general rules.

Global and static variables have addresses that are fixed at compile
time (more precisely, at link time, but the compiler is dealing
with `constants').  The offsets of members of structures within
structures simply add, and this too is a compile time constant.
Hence `s1.o.x' is `(address of s1) + (offset of o) + (offset of
x)', which is the same value as `address of s1' (since the last
two are [probably] zero).  Since this is a compile (link) time
constant, the compiler should simply access the float called
`s1.o.x'.  On a purely hypothetical machine called a Vax :-) this
is, e.g.,

	addf2	L16,_s1

Automatic (stack) variables do not have fixed addresses.  The rest
of the computation is the same, but `address of s1' is computed as
`stack (frame) pointer plus offset to s1'.  On our hypothetical
Vax this is

	addf2	L16,-24(fp)

which is still one instruction, although possibly slightly slower.
Given enough stack data, this might become

	addf2	L16,-480(fp)

which may be slower still, because the offset -480(fp) is encoded
in two bytes, while -24(fp) is encoded in one.  The actual timings
depend not only on the Vax model but also on the alignment of the
particular instruction and the contents of the cache and other
uncertainties.

Finally, pointer following can often be done only if the pointer's
value is in a register.  `s2->o.y' means `(value of s2) + (offset
of o) + (offset of y)' or (Vax) `(value of s2) + 4'.  This becomes
either

	movl	_s2,r0
	addf2	L16,4(r0)

(if, e.g., s2 is global), or

	movl	-8(fp),r0
	addf2	L16,4(r0)

(for automatic s2) or

	addf2	L16,4(r10)

(for `register struct sphere *s2').

>Should I define
>
>vector center;
>
>center = s2->o;
>
>if I plan to use s2->o several times?

If `s2' is in a register, and `center' is automatic, this will
probably make no difference.  If both are automatic, or if s2 is
global or static, this may help.  But the only way to find out
for certain is to try it, either timing the program or looking at
the generated code.

This sort of micro-tuning is best done after making the program
work and *profiling*.  You may be able to cut a three hour run to
two hours with careful tuning, but only if you work on the code
that takes that extra hour.  Tuning code that does not run often
does not make much difference.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7690)
Domain:	chris@mimsy.umd.edu	Path:	seismo!mimsy!chris