[comp.arch] Powers of 2, was Re: RISC is a nasty no-no!

muller@Alliant.COM (Jim Muller) (03/04/88)

In <17415@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes:

>...What scientific applications naturally map onto
>power-of-two arrays?  I suspect that making arrays a power of two in
>size is a habit mostly of systems programmers, who know to make things
>fit neatly into memory pages in order to reduce paging.

Practically anything involving Fourier Transforms will tend to use
power-of-2-length arrays, since this is required to do FFT's (Fast Fourier
Transforms) - the requirement is built into the algorithm.  Well, okay, it
isn't strictly required, but is generally desirable.  The fields which do
FFT's routinely are anything involving signal or image processing, e.g.
radar, seismic, acoustic, or visual data processing.  (Okay, technically,
it isn't the scientific application itself which maps into a 2**N array,
but the gain in processing speed dictates it, so the effect is the same.)

Also included is practically any subject which deals with wave propagation,
since many problems are easier to solve in wave-number space rather than
in position in x,y,z coordinates.  (Wave-number is the Fourier mapping of
position just as frequency is the Fourier mapping of time.)

lisper@yale.UUCP (Bjorn Lisper) (03/05/88)

In article <1339@alliant.Alliant.COM> muller@alliant.UUCP (Jim Muller) writes:
>In <17415@think.UUCP> barmar@fafnir.think.com.UUCP (Barry Margolin) writes:
>
>>...What scientific applications naturally map onto
>>power-of-two arrays?  I suspect that making arrays a power of two in
>>size is a habit mostly of systems programmers, who know to make things
>>fit neatly into memory pages in order to reduce paging.
>
>Practically anything involving Fourier Transforms will tend to use
>power-of-2-length arrays, since this is required to do FFT's (Fast Fourier
>Transforms) - the requirement is built into the algorithm.

As I and Jim have pointed out, FFT. As a matter of fact the divide-and-
conquer paradigm leads to algorithms which typically works well on power-of
two arrays. A scientific computing example that comes to mind is odd-even
cyclic reduction of tridiagonal linear equation systems (a way to solve such
a nxn system in parallel in O(log n) time).

To design a machine so that arrays of length 2^n gives a slowdown must be a
grave error, considered the importance of the divide-and-conquer paradigm in
parallel processing.

Bjorn Lisper

urjlew@ecsvax.UUCP (Rostyk Lewyckyj) (03/05/88)

There have been several comments about how problems dealing with
power of two points are a no no on many computers because they
lead to memory bank access conflicts. There has also already been
published a fairly simple work around. 
Don't declare your arrays with power of two shape factors.
i.e don't DIMENSION any component to be a power of two.
There is usually nothing which forces your dimensions to be 
exactly the size of your problem. If you are doing an FFT of
256 or 1024 points, just dimension the array to be 257 or 1025
elements long.
I believe that on the CRAY the magic number for bank conflicts
is something like 48 which is not a power of two.
On the IBM 3090 16 is a magic number because of the sructure
of the cache rather than memory bank interleave.
-----------------------------------------------
  Reply-To:  Rostyslaw Jarema Lewyckyj
             urjlew@ecsvax.UUCP ,  urjlew@tucc.bitnet
       or    urjlew@tucc.tucc.edu    (ARPA,SURA,NSF etc. internet)
       tel.  (919)-962-9107

eugene@pioneer.arpa (Eugene N. Miya) (03/09/88)

In article <4734@ecsvax.UUCP> urjlew@ecsvax.UUCP (Rostyk Lewyckyj) writes:
>I believe that on the CRAY the magic number for bank conflicts
>is something like 48 which is not a power of two.
>On the IBM 3090 16 is a magic number because of the sructure
>of the cache rather than memory bank interleave.

48? Naw, many of us have measured it as powers of 2 (now it depends
on your memory size as to the worst stride).  But, more to the point
I've not measured the factor of 16 on the 3090 (A 200 in my case, and I have
measured conflicits at 128 elements), can you show me and the net some data?
(Not a literature reference, [I have Dorn's TR] but some real data and
a code chunk if you have it).

From the Rock of Ages Home for Retired Hackers:

--eugene miya, NASA Ames Research Center, eugene@ames-aurora.ARPA
  "You trust the `reply' command with all those different mailers out there?"
  "Send mail, avoid follow-ups.  If enough, I'll summarize."
  {uunet,hplabs,hao,ihnp4,decwrl,allegra,tektronix}!ames!aurora!eugene