[comp.lang.fortran] Cray memory stride

bernhold@red (David E. Bernholdt) (03/31/91)

In article <1991Mar30.142903.5225@ariel.unm.edu> prentice@triton.unm.edu (John Prentice) writes:
>One other question.  What is the size of the memory banks on the 
>YMP and hence what strides should you avoid?

I asked this question a while ago.  I _think_ that the (two) YMPs to
which I have access have 256 banks.  In general, there are a wide
range of possible bank configurations for a given type of Cray.
Unicos has a command and a system call called "target" which report
type of Cray, memory size, number of banks, clock period, etc. for the
host in question. The system call has to be made via a C routine, but
in Unicos it is fairly easy to call C from Fortran.

Once you've got the number of banks, the most general thing to do is
to work with vectors who's length is prime relative to the number of
banks.

One caveat on the "target" command/call:  the reponse returned can be
configured by the system manager, though I'm not sure I see the value
in "lying" about the machine's configuration.
-- 
David Bernholdt			bernhold@qtp.ufl.edu
Quantum Theory Project		bernhold@ufpine.bitnet
University of Florida
Gainesville, FL  32611		904/392 6365

djh@xipe.osc.edu (David Heisterberg) (03/31/91)

In article <27758@uflorida.cis.ufl.EDU> bernhold@red (David E. Bernholdt) writes
>I asked this question a while ago.  I _think_ that the (two) YMPs to
>which I have access have 256 banks.
>...
>Once you've got the number of banks, the most general thing to do is
>to work with vectors who's length is prime relative to the number of
>banks.

For single stream performance it is the number of sub-sections rather
than banks that is important.  A memory reference makes a particular
bank unavailable to all CPUs, and the containing sub-section unavailable
to further references from the same CPU, for 5 CP.  I lack a definative
statement, but I think all YMPs have 32 sub-sections.  The 2 series has
64 banks; the 4, 128; and the 8, 256.
--
David J. Heisterberg		djh@osc.edu		And you all know
The Ohio Supercomputer Center	djh@ohstpy.bitnet	security Is mortals'
Columbus, Ohio  43212		ohstpy::djh		chiefest enemy.

ftower@ncar.ucar.EDU (Francis Tower) (04/02/91)

John,

I was out to lunch on my memory stride comments.  David Heisterberg
(a nice german name) was correct.  It is subsection conflicts which 
have precedence over the bank conflicts for a single CPU.  The memory
stride and relative performances for the subsections are:

  Mod(STRIDE, 32)          rel. Performance

       0                      1/5
     1 - 7                     1
       8                      4/5
     9 - 15                    1
       16                     2/5
     17- 23                    1
       24                     4/5
     25-31                     1


   There are 3 types of memory conflicts:
     Section -- two or more ports in the same CPU want to access the
                same memory section.

     Subsection -- a port in the same CPU wants to access the same
                subsection referenced by another port.  Wait time is
                1 to 4 clocks.

     Bank -- any port from any CPU wants to access a bank (at the
             same time or subsequently).  Wait time is 1 to 5 clocks.
             Cray breaks this into two cases 'Simultaneous Bank' and
             'Bank Busy'.

Today, It's Racquet Ball!

Francis G Tower
Software QA
NCAR/CGD/ICS

<< Middle-aged Mutant Ninja Modelers >>
"Don't be misled by truth. Science is fact!"