lackey@Alliant.COM (Stan Lackey) (03/04/88)
In article <17415@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes: >>...when you start programming on supers you will soon >>learn that power of two array sizes are the worse choice you can make. >Could you explain why this is so? Many supercomputers, and mini-supercomputers, to increase memory bandwidth, set up independent banks of slow memory and interleave them on low address bits. Vector references effectively cause several banks to access/cycle at roughly the same time, to stream elements to the CPU(s) at very high rates. So. Suppose the memory is 8-way interleaved, and the array is a power of 2 greater than or equal to 8. All references will go to the same bank, reducing bandwidth to the CYCLE time of the memory RAM. DRAM cycle times are typically 200nS, so you can see what that can do to performance. -Stan
eugene@eos.UUCP (Eugene Miya) (03/05/88)
In article <1334@alliant.Alliant.COM> lackey@alliant.UUCP (Stan Lackey) writes: >Many supercomputers, and mini-supercomputers, to increase memory bandwidth, >set up independent banks of slow memory and interleave them on low address Actually, I would not say "many." Crays are the only machines I have recorded which measureably have this as a problem. These machines are no slouches on speed. They are 'relatively' slow ;-). Other machines (and I've not run this particular test on the Alliant because it has a relatively low-resolution clock) have noted this problem and have made other architectural compromises (e.g., Cyber 205 and IBM 3090 which work around this in different ways), but maybe I'll get some time and stress (torture) our Alliant ;-). From the Rock of Ages Home for Retired Hackers: --eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA "You trust the `reply' command with all those different mailers out there?" "Send mail, avoid follow-ups. If enough, I'll summarize." {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene
dfk@duke.cs.duke.edu (David Kotz) (03/05/88)
In article <1334@alliant.Alliant.COM>, lackey@Alliant.COM (Stan Lackey) writes: > > So. Suppose the memory is 8-way interleaved, and the array is a power > of 2 greater than or equal to 8. All references will go to the same bank, > reducing bandwidth to the CYCLE time of the memory RAM. DRAM cycle times > are typically 200nS, so you can see what that can do to performance. > -Stan This is true if your array is stored in row-major order (i.e. elements of a row spread across modules, elements of a column all in one module), and you are trying to access a given column. If you are accessing successive elements of a row it doesn't matter what the dimension of your array is. A good discussion of all this can be found in Stone, "High Performance Computer Architectures", Addison Wesley 1987, chapter 5, section 3. David Kotz -- Deparment of Computer Science, Duke University, Durham, NC 27706 ARPA: dfk@cs.duke.edu CSNET: dfk@duke UUCP: {ihnp4!decvax}!duke!dfk
lisper@yale.UUCP (Bjorn Lisper) (03/06/88)
In article <11233@duke.cs.duke.edu> dfk@duke.cs.duke.edu (David Kotz) writes: >In article <1334@alliant.Alliant.COM>, lackey@Alliant.COM (Stan Lackey) >writes: >> So. Suppose the memory is 8-way interleaved, and the array is a power >> of 2 greater than or equal to 8. All references will go to the same bank, .... >This is true if your array is stored in row-major order (i.e. elements >of a row spread across modules, elements of a column all in one >module), and you are trying to access a given column. If you are >accessing successive elements of a row it doesn't matter what the >dimension of your array is. It seems that a smart compiler should be able to detect certain cases where array elements are accessed in a order that cause slowdown due to memory bank conflicts, and change the order of storage to speedup the access. Or perhaps simply use a hash function to randomize the access pattern? This is of course not possible in a language like FORTRAN where the order of storage is strictly specified. Does anyone have any comments on this? Bjorn Lisper
marc@apollo.uucp (Marc Gibian) (03/08/88)
Many of the CDC Cyber machines of the high side of the 170 series used banked memories, along with multiple functional units. Once in a while you would run into an algorithm that would get into phase with itself and try to make all memory accesses from the same bank, and would be mysteriously slow. These were -real FUN- to fix. email: marc@apollo.UUCP
hankd@pur-ee.UUCP (Hank Dietz) (03/08/88)
In article <24605@yale-celray.yale.UUCP>, lisper@yale.UUCP (Bjorn Lisper) writes: > It seems that a smart compiler should be able to detect certain cases where > array elements are accessed in a order that cause slowdown due to memory > bank conflicts, and change the order of storage to speedup the access. Or > perhaps simply use a hash function to randomize the access pattern? This > is of course not possible in a language like FORTRAN where the order of > storage is strictly specified. Bjorn is quite correct, indeed, this is one of the main problems that we supercomputer compiler folk are looking at. It turns out to be quite a nasty problem to re-arrange C data structures without changing the function of the program (see all the noise about just changing member order within a struct in comp.lang.c), but it is SOMETIMES do-able with a clever compiler. FORTRAN is not really much easier than C, incidentally, because although it doesn't have multi-level indirection, it does have things like COMMONs and EQUIVALENCE which link layout of multiple data structures in complex ways. I'm currently doing a lot of work on automatic parallelization for non-shared-memory machines... and this is a major concern. Incidentally, the hash function idea works: for a couple of years I've had a 95% completed paper sitting around on the use of nearly perfect hash functions to implement software address skewing... maybe I ought to complete it and send it for publication somewhere? -hankd
lisper@yale.UUCP (Bjorn Lisper) (03/09/88)
In article <7690@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: >In article <24605@yale-celray.yale.UUCP>, lisper@yale.UUCP (Bjorn Lisper) >writes: >> It seems that a smart compiler should be able to detect certain cases where >> array elements are accessed in a order that cause slowdown due to memory >> bank conflicts, and change the order of storage to speedup the access. Or >> perhaps simply use a hash function to randomize the access pattern? This >> is of course not possible in a language like FORTRAN where the order of >> storage is strictly specified. > >Bjorn is quite correct, indeed, this is one of the main problems that >we supercomputer compiler folk are looking at. It turns out to be >quite a nasty problem to re-arrange C data structures without changing >the function of the program (see all the noise about just changing >member order within a struct in comp.lang.c), but it is SOMETIMES >do-able with a clever compiler. FORTRAN is not really much easier >than C, incidentally, because although it doesn't have multi-level >indirection, it does have things like COMMONs and EQUIVALENCE which >link layout of multiple data structures in complex ways. I'm >currently doing a lot of work on automatic parallelization for >non-shared-memory machines... and this is a major concern. One of the implications of my first posting that I didn't spell out is that languages like C and FORTRAN maybe are not that suitable for programming supercomputers for the reasons you mention; there are FORTRAN programs out there whose correctness rely on matrix elements being stored in a certain order, for instance. If we had a language that specifies nothing but the functionality of accessing an array element, that is: the only thing we know about a(i,j) is that it returns the current value of a(i,j) when referred to, then an optimizing compiler for that language would have much more room to play around with different storage orderings in order to minimize access time. Isn't this a nice argument against using a language like FORTRAN in scientific computing? >Incidentally, the hash function idea works: for a couple of years I've >had a 95% completed paper sitting around on the use of nearly perfect >hash functions to implement software address skewing... maybe I ought >to complete it and send it for publication somewhere? Fun to hear that my idea wasn't totally out to lunch, it was just a flash I got when writing my previous posting. I guess the trick is to find a hash function that is cheap enough to evaluate while still not wasting too much memory from the "imperfectness" of the hash function. Bjorn Lisper
rober@weitek.UUCP (Don Rober) (03/10/88)
In article <24737@yale-celray.yale.UUCP> lisper@yale-celray.UUCP (Bjorn Lisper) writes: >In article <7690@pur-ee.UUCP> hankd@pur-ee.UUCP (Hank Dietz) writes: > >If we had a language that specifies nothing but the >functionality of accessing an array element, that is: the only thing we know >about a(i,j) is that it returns the current value of a(i,j) when referred >to, then an optimizing compiler for that language would have much more room >to play around with different storage orderings in order to minimize access >time. Isn't this a nice argument against using a language like FORTRAN in >scientific computing? > Interesting. For all of its warts, APL displays these properties. For years it was the highest performing scientific language on the Burroughs B6000/B7000 series of machines.-- ---------------------------------------------------------------------------- Don Rober UUCP: {pyramid, cae780}!weitek!rober Weitek Corporation 1060 East Arques Sunnyvale, CA 94086
leichter@yale.UUCP (Jerry Leichter) (03/10/88)
In article <24737@yale-celray.yale.UUCP> lisper@yale-celray.UUCP (Bjorn Lisper) writes: >... If we had a language that specifies nothing but the >functionality of accessing an array element, that is: the only thing we know >about a(i,j) is that it returns the current value of a(i,j) when referred >to, then an optimizing compiler for that language would have much more room >to play around with different storage orderings in order to minimize access >time. Isn't this a nice argument against using a language like FORTRAN in >scientific computing? Nice theoretical argument - but it ignores the central issue: Maybe the REASON the scientific community like FORTRAN is that it ALLOWS them to "play around". In fact, such "playing around" is an absolutely standard technique in many large numerical codes; it's what makes them practical. A common thing to do is to allocate a large one-dimensional array and carve variable-sized 2-D arrays out of it. The way this is done is ugly, because FORTRAN has no inherent ability to deal with dynamic memory allocation so it all has to be faked. If the code had been done in C, for example, the space would have been allocated with malloc(). Of course, that doesn't really give the a compiler much more to go on than was available in the FORTRAN case! Before proposing to remove something so central to large codes, you'd better understand WHY it's central, and what you can provide to replace it with. -- Jerry
lisper%thor@CS.YALE.EDU (Bjorn Lisper) (03/11/88)
In article <24861@yale-celray.yale.UUCP> leichter@yale-celray.UUCP (Jerry Leichter) writes: >In article <24737@yale-celray.yale.UUCP> lisper@yale-celray.UUCP (Bjorn >Lisper) writes: >>... If we had a language that specifies nothing but the >>functionality of accessing an array element, that is: the only thing we know >>about a(i,j) is that it returns the current value of a(i,j) when referred >>to, then an optimizing compiler for that language would have much more room >>to play around with different storage orderings in order to minimize access >>time. Isn't this a nice argument against using a language like FORTRAN in >>scientific computing? > >Nice theoretical argument - but it ignores the central issue: Maybe the >REASON the scientific community like FORTRAN is that it ALLOWS them to "play >around". In fact, such "playing around" is an absolutely standard technique >in many large numerical codes; it's what makes them practical. .... .... >Before proposing to remove something so central to large codes, you'd better >understand WHY it's central, and what you can provide to replace it with. And maybe the reason why people want to have total control over memory management as in FORTRAN is that there have been no compilers for other, less memory-explicit languages that have produced good code? A great deal of the development of parallelizing compilers for supercomputers have been development of FORTRAN compilers, for natural marketing reasons. Unfortunately, for the reasons I pointed out, there is a limit to how smart these compilers can be. So as long there is no real effort put into developing good parallelizing compilers for more high-level languages the art of parallelizing & optimizing compiling will not develop to its full potential. Another issue is of course portability; FORTRAN code hand-tailored for one machine will perhaps not perform too well on another (since the compiler cannot alter the programmer's detailed decisions), while for a higher-level language more decisions are left to the compiler. But this is an old story... Anyhow: I have full understanding for that people who needs top performance in supercomputing today use FORTRAN; they haven't got very much else to choose from. But as long as people stick to this language there is a limit to how good the compilers can be. Maybe this isn't a comp.arch discussion anymore? Bjorn Lisper
edk@gryphon.CTS.COM (Ed Kaulakis) (03/14/88)
In article <24861@yale-celray.yale.UUCP>, leichter@yale.UUCP (Jerry Leichter) writes: > In article <24737@yale-celray.yale.UUCP> lisper@yale-celray.UUCP (Bjorn Lisper) writes: [] > Nice theoretical argument - but it ignores the central issue: Maybe the > REASON the scientific community like FORTRAN is that it ALLOWS them to "play > around". In fact, such "playing around" is an absolutely standard technique > in many large numerical codes; it's what makes them practical. A common thing > to do is to allocate a large one-dimensional array and carve variable-sized > 2-D arrays out of it. The way this is done is ugly, because FORTRAN has no > inherent ability to deal with dynamic memory allocation so it all has to be > faked. If the code had been done in C, for example, the space would have been > allocated with malloc(). Of course, that doesn't really give the a compiler > much more to go on than was available in the FORTRAN case! > > Before proposing to remove something so central to large codes, you'd better > understand WHY it's central, and what you can provide to replace it with. > > -- Jerry Therefore, it's not worth discussing other approaches? Look, I know how you feel. Read "Time, Progress, and the Older Programmer". The only substitute for 30 years experience is 5 years experience, 6 times. It will be time to retire soon anyway. Take your pills.