bron@olympus.SGI.COM (Bron C. Nelson) (03/11/88)
In article <4712@lll-winken.llnl.gov>, brooks@lll-crg.llnl.gov (Eugene D. Brooks III) writes: > In article <3300021@uiucdcsm> grunwald@uiucdcsm.cs.uiuc.edu writes: > > > >re: address hashing to randomize bank conflicts. > > > >I would think that this would be too slow in software. However, if I recall > >correctly, randomized hashing was done in hardware on the RP-3 (it had some > >sort of variable hashing so you could fiddle with the parameters). > > Those who have been using supercomputers for the past decade have dealt > with this sort of problem by carefully avoiding the undesired addressing > modes. Hashing takes care of the few pathological cases by slowing down > the majority of cases which would normally stream at one word per clock. > To slow down the machine to "random gather" speed for all of the normally > high performance situtations is not desired. This applies to shared memory > multiprocessors as well as it does to single cpu supercomputers. The issue regarding memory bank conflicts usually has to do (no surprise) with array acesses. I seem to recall that someone did a study of array indexing (either LLNL or Cray I believe) and concluded that for their test cases, about 60% of array accesses had a stride of 1 (i.e. the code stepped sequentially through the array in memory order), about 20% had stride 2, and about 20% "other". WARNING: this is off the top of my head; probably mis-remembered (the stride 2 number is suspiciously high). This implies that relatively few jobs benifit from randomizing the banks, and slowing down memory access for everyone to get it may not be a win. Can anyone (Brooks?) provide real stats about this? Bron Nelson bron@sgi.com Don't blame my employers for my opinions.
eugene@pioneer.arpa (Eugene N. Miya) (03/12/88)
Stride usage of arrays It was closer to 80% of Fortran Loops. It was Knuth's 1971 Empirical Study of Fortran paper in Software Practice and Experience required reading for all computer students in my opinion. BTW: this is not completely an architecture paper, but still should be studied. Also a motivation for RISCs. I disagree with your opinion about separating algorithms from architectures from the standpoint of studying them, but I have to agree that posting pure algorithm stuff in the architecture group is wasting bits. From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
brooks@lll-crg.llnl.gov (Eugene D. Brooks III) (03/12/88)
In article <12514@sgi.SGI.COM> bron@olympus.SGI.COM (Bron C. Nelson) writes: >The issue regarding memory bank conflicts usually has to do (no >surprise) with array acesses. I seem to recall that someone did a >study of array indexing (either LLNL or Cray I believe) and concluded >that for their test cases, about 60% of array accesses had a stride >of 1 (i.e. the code stepped sequentially through the array in memory >order), about 20% had stride 2, and about 20% "other". WARNING: this >is off the top of my head; probably mis-remembered (the stride 2 My best informant indicates 80% stride 1, 20% "other". For two D arrays people actively pad to get stride 1 one way and an "odd" stride the other so array refs in both dimension go at the full clip. Of the 20% "other", about 3% is estimated to be random gather. There are heavily used codes, however, which use random gather at the 25% level. Its just that this might be one code out of 10 or 20. Random gather "never" runs at the full clip on any machine due to "random conflicts", and very few machines handle random gather without a "several clock penalty" per vector element. Some machines, the names left unmentioned for my own personal protection, are quite effectively castrated by random gather in performance terms (even though the random gather is supported in hardware). The data mentioned above are "estimates" from knowledgeable sources, such detailed statistics are very difficult to obtain.
ccplumb@watmath.waterloo.edu (Colin Plumb) (03/14/88)
eugene@pioneer.UUCP (Eugene N. Miya) wrote: >I disagree with your opinion about separating algorithms from architectures >from the standpoint of studying them, but I have to agree that posting >pure algorithm stuff in the architecture group is wasting bits. Well, as others have observed, optimizing compilers are an integral part of RISC architectures, VLIW architectures, vector architectures,... I think you can go quite deep into software algorithms without getting beyond comp.arch's purview. Certainly the discussion of reorganization algorithms is applicable. So bring on the killer optimizing compiler technology! There's no such thing as too fast. -- -Colin (watmath!ccplumb) Zippy says: - if it GLISTENS, gobble it!!