[comp.arch] Sun-4/110 Performance Anomaly

stuart@rennet.cs.wisc.edu (Stuart Friedberg) (12/03/88)

I have a program to compute convex hulls using short, rational
arithmetic, that I have been using as a specialized benchmark to look
at (1) integer arithmetic and comparison performance, (2) sequential
access to large arrays, and (3) paging behavior.

I have encountered an anomaly on the SPARC architecture (Sun-4/110,
running SunOS 4.0) that does not appear on Vaxen or 68K based machines,
and was hoping someone could explain it to me.

I statically allocate a large array of either 2Meg or 4Meg bytes in
size.  Elements of the array are struct { short x, y }.  I run the
benchmark with a parameter specifying how much of the array to use.
Access to the array is basically sequential.

When the LARGER array is used, ALL runs of the benchmark for ALL values
of the parameter are roughly ten percent FASTER.  This is both
repeatable and statistically significant.  But the unused portion of
the array is never accessed!

I examined the assembler produced by the Sun-4 compiler, and they
differ only in the values of the constants assembled.  There are
overflow checks against indices of 500,000 and 1,000,000, and address
offsets of 2,000,000 and 4,000,000.  That is the sole difference.

The same instructions seem to be generated in both cases, at least the
assembler mnemonics are the same, and I assume (perhaps incorrectly)
that the alignment of the code is the same in both cases.  The
statically allocated array is loaded as an assembler .common .

Can anyone suggest what might be going on here, or what I should check
in more detail?  I expected the larger array to either have no effect
at all, or to produce slightly slower runs.  Instead, I have seeing
a significant speedup, even when I don't refer to the extra memory.
I have not (yet) familiar with the SPARC and the Sun-4/110 memory
system to have any good idea what is going on.

Stu Friedberg  stuart@cs.wisc.edu