[comp.arch] Memory bank conflicts

bron@olympus.SGI.COM (Bron C. Nelson) (03/11/88)

In article <4712@lll-winken.llnl.gov>, brooks@lll-crg.llnl.gov (Eugene D. Brooks III) writes:
> In article <3300021@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes:
> >
> >re: address hashing to randomize bank conflicts.
> >
> >I would think that this would be too slow in software. However, if I recall
> >correctly, randomized hashing was done in hardware on the RP-3 (it had some
> >sort of variable hashing so you could fiddle with the parameters).
> 
> Those who have been using supercomputers for the past decade have dealt
> with this sort of problem by carefully avoiding the undesired addressing
> modes.  Hashing takes care of the few pathological cases by slowing down
> the majority of cases which would normally stream at one word per clock.
> To slow down the machine to "random gather" speed for all of the normally
> high performance situtations is not desired.  This applies to shared memory
> multiprocessors as well as it does to single cpu supercomputers.

The issue regarding memory bank conflicts usually has to do (no
surprise) with array acesses.  I seem to recall that someone did a
study of array indexing (either LLNL or Cray I believe) and concluded
that for their test cases, about 60% of array accesses had a stride
of 1 (i.e. the code stepped sequentially through the array in memory
order), about 20% had stride 2, and about 20% "other".  WARNING: this
is off the top of my head; probably mis-remembered (the stride 2
number is suspiciously high).  This implies that relatively few jobs
benifit from randomizing the banks, and slowing down memory access
for everyone to get it may not be a win.

Can anyone (Brooks?) provide real stats about this?

Bron Nelson   bron@sgi.com
Don't blame my employers for my opinions.

eugene@pioneer.arpa (Eugene N. Miya) (03/12/88)

Stride usage of arrays
It was closer to 80% of Fortran Loops.  It was Knuth's 1971
Empirical Study of Fortran paper in Software Practice and Experience
required reading for all computer students in my opinion.
BTW: this is not completely an architecture paper, but still should be
studied.  Also a motivation for RISCs.

I disagree with your opinion about separating algorithms from architectures
from the standpoint of studying them, but I have to agree that posting
pure algorithm stuff in the architecture group is wasting bits.

From the Rock of Ages Home for Retired Hackers:

--eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene

brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (03/12/88)

In article <12514@sgi.SGI.COM> bron@olympus.SGI.COM (Bron C. Nelson) writes:
>The issue regarding memory bank conflicts usually has to do (no
>surprise) with array acesses.  I seem to recall that someone did a
>study of array indexing (either LLNL or Cray I believe) and concluded
>that for their test cases, about 60% of array accesses had a stride
>of 1 (i.e. the code stepped sequentially through the array in memory
>order), about 20% had stride 2, and about 20% "other".  WARNING: this
>is off the top of my head; probably mis-remembered (the stride 2
My best informant indicates 80% stride 1, 20% "other".  For two
D arrays people actively pad to get stride 1 one way and an "odd" stride
the other so array refs in both dimension go at the full clip.

Of the 20% "other", about 3% is estimated to be random gather.  There
are heavily used codes, however, which use random gather at the 25%
level.  Its just that this might be one code out of 10 or 20. Random
gather "never" runs at the full clip on any machine due to "random
conflicts", and very few machines handle random gather without a "several
clock penalty" per vector element.  Some machines, the names left
unmentioned for my own personal protection, are quite effectively castrated
by random gather in performance terms (even though the random gather
is supported in hardware).

The data mentioned above are "estimates" from knowledgeable sources,
such detailed statistics are very difficult to obtain.

ccplumb@watmath.waterloo.edu (Colin Plumb) (03/14/88)

eugene@pioneer.UUCP (Eugene N. Miya) wrote:
>I disagree with your opinion about separating algorithms from architectures
>from the standpoint of studying them, but I have to agree that posting
>pure algorithm stuff in the architecture group is wasting bits.

Well, as others have observed, optimizing compilers are an integral
part of RISC architectures, VLIW architectures, vector architectures,...

I think you can go quite deep into software algorithms without getting
beyond comp.arch's purview.  Certainly the discussion of reorganization
algorithms is applicable.

So bring on the killer optimizing compiler technology!  There's no such
thing as too fast.
--
	-Colin (watmath!ccplumb)

Zippy says:
- if it GLISTENS, gobble it!!