[comp.parallel] Acceptable efficiency factors

xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) (06/29/90)

In a recent conversation with some colleagues of mine at the Ames NAS
facility concerning parallel processing, they mentioned their experiences
porting a code to the Intel i860 hybercube located there (128 nodes,
7.5 gigaFLOPS peek).  On this particular code they were able to
achieve about 300 MFLOPS for an efficiency factor of about 2.5%.  This
low efficiency factor didn't seem to bother them but it sure bothered
me.  Other colleagues of ours at the United Technologies Research Center
in East Hartford, CT ported similar codes to their 1/4 CM-2 and achieved
anywhere from 600 to 800 MFLOPS for an effiency factor of more than 50%.

It is our contention that it is necessary to achieve an efficiency factor
of at least 50% before the particular implementation of the code can be
considered appropriate for execution on that parallel processor system.
What are your opinions on this matter?  Are there any published papers
that deal with this subject?

Dave R.

--
David A. Remaklus
NASA Lewis Research Center
Cleveland, Ohio 44135
xxremak@csduts1.lerc.nasa.gov

lush@EE.MsState.Edu (Edward Luke) (06/29/90)

In article <9508@hubcap.clemson.edu> xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) writes:

>   In a recent conversation with some colleagues of mine at the Ames NAS
>   facility concerning parallel processing, they mentioned their experiences
>   porting a code to the Intel i860 hybercube located there (128 nodes,
>   7.5 gigaFLOPS peek).

ANY peak performance number is bogus.  A peak number can be considered
as a number "guaranteed not to exceed", and no more.  Intel claims
that the i860 can put out something like 80 Mflops peak, but you would
be lucky to get 10 Mflops out of one on real applications.

>   It is our contention that it is necessary to achieve an efficiency factor
>   of at least 50% before the particular implementation of the code can be
>   considered appropriate for execution on that parallel processor system.
>   What are your opinions on this matter?  Are there any published papers
>   that deal with this subject?

I would say that the most important factors are:

1)  Cost per *real* Mflop
2)  Mflop value required by the application.  
    (Application A requires 300 Mflops to run in one hour)
3)  Ease of programming, or program development time.

fyodor@decwrl.dec.com (Chris Kuszmaul) (06/30/90)

In article <9508@hubcap.clemson.edu>, xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) writes:
> In a recent conversation with some colleagues of mine at the Ames NAS
> facility concerning parallel processing, they mentioned their experiences
> porting a code to the Intel i860 hybercube located there (128 nodes,
> 7.5 gigaFLOPS peek).  On this particular code they were able to
> achieve about 300 MFLOPS for an efficiency factor of about 2.5%.  This
> low efficiency factor didn't seem to bother them but it sure bothered
> me.  Other colleagues of ours at the United Technologies Research Center
> in East Hartford, CT ported similar codes to their 1/4 CM-2 and achieved
> anywhere from 600 to 800 MFLOPS for an effiency factor of more than 50%.
> 
> It is our contention that it is necessary to achieve an efficiency factor
> of at least 50% before the particular implementation of the code can be
> considered appropriate for execution on that parallel processor system.
> What are your opinions on this matter?  Are there any published papers
> that deal with this subject?
> 
> Dave R.
> 
> --
> David A. Remaklus
> NASA Lewis Research Center
> Cleveland, Ohio 44135
> xxremak@csduts1.lerc.nasa.gov


   The following table is derived from information published in the CSRD Perfect Report,
of March 1990. One of my colleagues, Dr. Ken Jacobsen, Director of the Applications Group
at MasPar computer corporation, compiled it.

   What it contains is performance, and percentages of peak performance for each of several computer
systems, on each of several applications. There are two categories, one is for 'Base' performance,
one is for 'Optimized' performance - the base performance is from directly compiled code, the
optimized performance is for hand-tuned code.


  You will note that, except for workstations (for which the percentages are a little suspect-
consider the DEC 600-410S , which gets 107.7 (!!) percent of peak on hand tuned FLO52) the peak
performances for ANY high end computer rarely approach 50 percent, let alone a parallel computer.
Based on the above numbers, I suggest that any computer (parallel or otherwise) which is getting
more than ten percent of peak performance on a given application is doing at least a reasonably
acceptable job. If a system gets fifty percent, then you have an unusually good application/machine
matchup. 2.5 percent is a little low - but really is not that bad.
 
  Not that I would want anyone to buy anything other than a MasPar MP-1, which, by the way,
gets roughly 30% of peak performance on ARC3d.

    CLK




                          PERFECT CLUB SUMMARY


  Machine      Peak    ADM     ARC2D   ARC3D   FLO52   OCEAN   SPEC77  BDNA 
_______________________________________________________________________________
Cray XMP/14SE   200.0   
  Base Sec                47.2    26.8     -.-     8.8    74.7     -.-    18.0
  Base MFLOPS             10.7    68.7     -.-    62.4    22.2     -.-    50.3
  Base % Peak              5.4    34.4     -.-    31.2    11.1     -.-    25.2
  Opt  Sec                47.2    26.8     -.-     8.8    74.7     -.-    18.0
  Opt  MFLOPS             10.7    68.7     -.-    62.4    22.2     -.-    50.3
  Opt  % Peak              5.4    34.4     -.-    31.2    11.1     -.-    25.2

Cray XMP/416    941.0
  Base Sec                34.1    10.0    31.8     2.8    66.1    61.6    10.9
  Base MFLOPS             14.8   183.8   130.7   194.0    25.1    30.1    83.0
  Base % Peak              1.6    19.5    13.9    20.6     2.7     3.2     8.8
  Opt  Sec                 8.2     7.0    13.4     2.5    12.1     8.4     5.8
  Opt  MFLOPS             61.1   261.7   310.9   218.7   136.7   220.0   156.4
  Opt  % Peak              6.5    27.8    33.0    23.2    14.5    23.4    16.6

Cray YMP/832   2666.0
  Base Sec                27.0     4.1    17.8     1.7    46.7    36.0     7.5
  Base MFLOPS             18.7   448.2   233.1   328.7    35.5    51.5   121.5
  Base % Peak              0.7    16.8     8.7    12.3     1.3     1.9     4.6
  Opt  Sec                 5.6     2.7     5.2     1.6     6.0     3.4     3.1
  Opt  MFLOPS             90.6   682.3   792.6   347.4   275.4   543.3   288.4
  Opt  % Peak              3.4    25.6    29.7    13.0    10.3    20.4    10.8

Cray 2S/4128   1952.0
  Base Sec                38.4    18.4    71.0     8.9    73.2   105.9    11.1
  Base MFLOPS             13.1   100.3    58.5    61.7    22.7    17.5    81.3
  Base % Peak              0.7     5.1     3.0     3.2     1.2     0.9     4.2
  Opt  Sec                26.1    15.6    71.0     7.3    21.0    66.6    10.8
  Opt  MFLOPS             19.3   118.5    58.5    75.6    78.8    27.8    83.5
  Opt  % Peak              1.0     6.1     3.0     3.9     4.0     1.4     4.2

Cyber 205       400.0
  Base Sec               111.3     -.-   355.1    39.7   287.9   297.3    57.0
  Base MFLOPS              4.5     -.-    11.7    13.8     5.8     6.2    15.9
  Base % Peak              1.1     -.-     2.9     3.5     1.5     1.6     4.0
  Opt  Sec               111.3     -.-   355.1    15.0    84.3   297.3    57.0
  Opt  MFLOPS              4.5     -.-    11.7    36.7    19.7     6.2    15.9
  Opt  % Peak              1.1     -.-     2.9     9.2     4.9     1.6     4.0

ETA 10E         380.0
  Base Sec                 -.-     -.-   187.3    13.2     -.-   284.5     -.-
  Base MFLOPS              -.-     -.-    22.2    41.6     -.-     6.5     -.-
  Base % Peak              -.-     -.-     5.8    10.9     -.-     1.7     -.-
  Opt  Sec                 -.-     -.-   124.8    13.0     -.-   258.2     -.-
  Opt  MFLOPS              -.-     -.-    33.3    42.3     -.-     7.2     -.-
  Opt  % Peak              -.-     -.-     8.8    11.1     -.-     1.9     -.-

ETA 10G         570.0
  Base Sec                68.6     -.-   124.8     8.8   172.0   189.6    15.5
  Base MFLOPS              7.3     -.-    33.3    62.2     9.6     9.8    58.4
  Base % Peak              1.3     -.-     5.8    10.9     1.7     1.7    10.2
  Opt  Sec                68.6     -.-    83.1     8.7   172.0   171.7    15.5
  Opt  MFLOPS              7.3     -.-    50.0    63.2     9.6    10.8    58.4
  Opt  % Peak              1.3     -.-     8.8    11.1     1.7     1.9    10.2





ETA 10Q         210.0
  Base Sec                 -.-     -.-   320.7    23.9     -.-   514.6     -.-
  Base MFLOPS              -.-     -.-    13.0    23.0     -.-     3.6     -.-
  Base % Peak              -.-     -.-     6.2    11.0     -.-     1.7     -.-
  Opt  Sec                 -.-     -.-   214.3    23.5     -.-   493.0     -.-
  Opt  MFLOPS              -.-     -.-    19.4    23.4     -.-     3.8     -.-
  Opt  % Peak              -.-     -.-     9.2    11.1     -.-     1.8     -.-

Fujitsu VP100   285.7 (32-bit)
  Base Sec                62.8    20.9     -.-     8.8   226.2    98.2    19.5
  Base MFLOPS              8.0    88.1     -.-    62.4     7.3    18.9    46.4
  Base % Peak              2.8    30.8     -.-    21.8     2.6     6.6    16.2
  Opt  Sec                62.8    20.9     -.-     8.8   226.2    98.2    19.5
  Opt  MFLOPS              8.0    88.1     -.-    62.4     7.3    18.9    46.4
  Opt  % Peak              2.8    30.8     -.-    21.8     2.6     6.6    16.2

HitachiS820/80 3000.0 (32-bit)
  Base Sec                22.6     3.7     -.-     2.4     -.-    48.9     7.3
  Base MFLOPS             22.3   499.2     -.-   229.7     -.-    37.9   123.5
  Base % Peak              0.7    16.6     -.-     7.7     -.-     1.3     4.1
  Opt  Sec                22.6     3.7     -.-     2.4     -.-    48.9     7.3
  Opt  MFLOPS             22.3   499.2     -.-   229.7     -.-    37.9   123.5
  Opt  % Peak              0.7    16.6     -.-     7.7     -.-     1.3     4.1

NEC SX/2       1300.0 (32-bit)
  Base Sec                31.4     -.-    58.2     3.1    97.8    45.0     5.3
  Base MFLOPS             16.1     -.-    71.4   177.1    16.9    41.2   170.2
  Base % Peak              1.2     -.-     5.5    13.6     1.3     3.2    13.1
  Opt  Sec                31.4     -.-    32.1     3.1    97.8    35.2     5.3
  Opt  MFLOPS             16.1     -.-   129.5   177.1    16.9    52.7   170.2
  Opt  % Peak              1.2     -.-    10.0    13.6     1.3     4.1    13.1

INTEL iPSC/1      4.0 (32-bit)
  Base Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-

Mark III         12.8 (32-bit)
  Base Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-

NCUBE NCUBE/10  205.0 (32-bit)
  Base Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Base % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  Sec                 -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  MFLOPS              -.-     -.-     -.-     -.-     -.-     -.-     -.-
  Opt  % Peak              -.-     -.-     -.-     -.-     -.-     -.-     -.-

  



ALLIANT FX/8     94.4 (32-bit)
  Base Sec               493.4     -.-   743.5   113.6  2068.0  1310.3   195.2
  Base MFLOPS              1.0     -.-     5.6     4.8     0.8     1.4     4.6
  Base % Peak              1.1     -.-     5.9     5.1     0.8     1.5     4.9
  Opt  Sec               493.4     -.-   642.0    89.5  2068.0   194.8   195.2
  Opt  MFLOPS              1.0     -.-     6.5     6.1     0.8     9.5     4.6
  Opt  % Peak              1.1     -.-     6.9     6.5     0.8    10.1     4.9

ALLIANT FX/80   188.8 (32-bit)
  Base Sec               353.1   228.5   538.3    84.4  1582.0   975.4   116.5
  Base MFLOPS              1.4     8.1     7.7     6.5     1.0     1.9     7.8
  Base % Peak              0.7     4.3     4.1     3.4     0.5     1.0     4.1
  Opt  Sec               353.1   228.5   538.3    64.5  1582.0   189.6   116.5
  Opt  MFLOPS              1.4     8.1     7.7     8.5     1.0     9.8     7.8
  Opt  % Peak              0.7     4.3     4.1     4.5     0.5     5.2     4.1

ARDENT Titan 2   16.0 (32-bit)
  Base Sec               333.9     -.-     -.-   108.7     -.-   915.6   639.4
  Base MFLOPS              1.5     -.-     -.-     5.1     -.-     2.0     1.4
  Base % Peak             10.0     -.-     -.-    31.9     -.-    12.5     8.8
  Opt  Sec               333.9     -.-     -.-   108.7     -.-   915.6   639.4
  Opt  MFLOPS              1.5     -.-     -.-     5.1     -.-     2.0     1.4
  Opt  % Peak             10.0     -.-     -.-    31.9     -.-    12.5     8.8

CONVEX C220     100.0 (32-bit)
  Base Sec               126.8     -.-   430.8    44.6   501.3   953.2    72.2
  Base MFLOPS              4.0     -.-     9.6    12.3     3.3     1.9    12.5
  Base % Peak              4.0     -.-     9.6    12.3     3.3     1.9    12.5
  Opt  Sec               126.8     -.-   430.8    44.6   501.3   953.2    72.2
  Opt  MFLOPS              4.0     -.-     9.6    12.3     3.3     1.9    12.5
  Opt  % Peak              4.0     -.-     9.6    12.3     3.3     1.9    12.5

STARDENT 3010    48.0 (32-bit)
  Base Sec                96.0   184.0     -.-    78.0   510.0   482.0   124.0
  Base MFLOPS              5.2    10.0     -.-     7.0     3.3     3.8     7.3
  Base % Peak             10.8    20.8     -.-    14.6     6.9     7.9    15.2
  Opt  Sec                96.0   184.0     -.-    78.0   510.0   482.0   124.0
  Opt  MFLOPS              5.2    10.0     -.-     7.0     3.3     3.8     7.3
  Opt  % Peak             10.8    20.8     -.-    14.6     6.9     7.9    15.2

DEC 6000-410S     3.25(32-bit)
  Base Sec               324.0  2495.0  2794.0   339.0  1268.0  1283.0   718.0
  Base MFLOPS              1.6     0.7     1.5     1.6     1.3     1.4     1.3
  Base % Peak             49.2    21.5    46.2    49.2    40.0    43.1    40.0
  Opt  Sec               324.0  2495.0  2794.0   339.0  1268.0  1283.0   718.0
  Opt  MFLOPS              1.6     0.7     1.5     1.6     1.3     1.4     1.3
  Opt  % Peak             49.2    21.5    46.2    49.2    40.0    43.1    40.0

ENCORE Multimax   0.4 (32-bit)
  Base Sec              4231.2     -.- 35395.2  4540.5 23172.7 13584.0  8151.4
  Base MFLOPS              0.1     -.-     0.1     0.1     0.1     0.1     0.1
  Base % Peak             25.0     -.-    25.0    25.0    25.0    25.0    25.0
  Opt  Sec              3345.4     -.- 35395.2  3874.0 14650.1 13584.0  8151.4
  Opt  MFLOPS              0.2     -.-     0.1     0.1     0.1     0.1     0.1
  Opt  % Peak             50.0     -.-    25.0    25.0    25.0    25.0    25.0





VAX 11/780        1.25(32-bit)
  Base Sec              2121.0 56530.0 14097.0  2063.0  8337.0  8242.0  5339.0
  Base MFLOPS              0.2     0.0     0.3     0.3     0.2     0.2     0.2
  Base % Peak             16.0     0.0    24.0    24.0    16.0    16.0    16.0
  Opt  Sec              2121.0 56530.0 14097.0  2063.0  8337.0  8242.0  5339.0
  Opt  MFLOPS              0.2     0.0     0.3     0.3     0.2     0.2     0.2
  Opt  % Peak             16.0     0.0    24.0    24.0    16.0    16.0    16.0

APOLLO DSP10040  40.0 (32-bit)
  Base Sec               143.7   877.3     -.-   216.2     -.-     -.-   247.1
  Base MFLOPS              3.5     2.1     -.-     2.5     -.-     -.-     3.7
  Base % Peak              8.8     5.3     -.-     6.3     -.-     -.-     9.3
  Opt  Sec               143.7   877.3     -.-   216.2     -.-     -.-   247.1
  Opt  MFLOPS              3.5     2.1     -.-     2.5     -.-     -.-     3.7
  Opt  % Peak              8.8     5.3     -.-     6.3     -.-     -.-     9.3

DEC 3100          2.0 (32-bit)
  Base Sec               378.7     -.-  1773.9   457.9     -.-     -.-   577.8
  Base MFLOPS              1.3     -.-     2.3     1.2     -.-     -.-     1.6
  Base % Peak             65.0     -.-   115.0    60.0     -.-     -.-    80.0
  Opt  Sec               257.1     -.-  1773.9   260.3     -.-     -.-   577.8
  Opt  MFLOPS              2.0     -.-     2.3     2.1     -.-     -.-     1.6
  Opt  % Peak            100.0     -.-   115.0   105.0     -.-     -.-    80.0

MIPS M/120        8.3 (32-bit)
  Base Sec               439.4     -.-     -.-   209.7     -.-     -.-   515.5
  Base MFLOPS              1.1     -.-     -.-     2.6     -.-     -.-     1.8
  Base % Peak             13.3     -.-     -.-    31.3     -.-     -.-    21.7
  Opt  Sec               439.4     -.-     -.-   209.7     -.-     -.-   515.5
  Opt  MFLOPS              1.1     -.-     -.-     2.6     -.-     -.-     1.8
  Opt  % Peak             13.3     -.-     -.-    31.3     -.-     -.-    21.7

MIPS RS2030       8.35(32-bit)
  Base Sec               475.5     -.-     -.-   244.7   992.7  1877.3   575.6
  Base MFLOPS              1.1     -.-     -.-     2.2     1.7     1.0     1.6
  Base % Peak             13.2     -.-     -.-    26.3    20.4    12.0    19.2
  Opt  Sec               475.5     -.-     -.-   244.7   992.7  1877.3   575.6
  Opt  MFLOPS              1.1     -.-     -.-     2.2     1.7     1.0     1.6
  Opt  % Peak             13.2     -.-     -.-    26.3    20.4    12.0    19.2

SUN SPARC 1       5.5 (32-bit)
  Base Sec               601.4     -.-  3474.6   540.7     -.-     -.-   822.9
  Base MFLOPS              0.8     -.-     1.2     1.0     -.-     -.-     1.1
  Base % Peak             14.5     -.-    21.8    18.2     -.-     -.-    20.0
  Opt  Sec               341.9     -.-  2633.0   354.5     -.-     -.-   822.9
  Opt  MFLOPS              1.5     -.-     1.6     1.5     -.-     -.-     1.1
  Opt  % Peak             27.3     -.-    29.1    27.3     -.-     -.-    20.0

SUN SPARC 330     7.0 (32-bit)
  Base Sec               442.8     -.-  2598.9   419.5     -.-     -.-   584.7
  Base MFLOPS              1.1     -.-     1.6     1.3     -.-     -.-     1.5
  Base % Peak             15.7     -.-    22.9    18.6     -.-     -.-    21.4
  Opt  Sec               260.8     -.-  1865.5   183.3     -.-     -.-   584.7
  Opt  MFLOPS              1.9     -.-     2.2     1.9     -.-     -.-     1.5
  Opt  % Peak             27.1     -.-    31.4    27.1     -.-     -.-    21.4





SUN 3/280         0.25(32-bit)
  Base Sec              4435.0     -.- 30119.9  4537.5     -.-     -.-  6269.8
  Base MFLOPS              0.1     -.-     0.1     0.1     -.-     -.-     0.1
  Base % Peak             40.0     -.-    40.0    40.0     -.-     -.-    40.0
  Opt  Sec              3152.8     -.- 30119.9  4268.5     -.-     -.-  6269.8
  Opt  MFLOPS              0.2     -.-     0.1     0.1     -.-     -.-     0.1
  Opt  % Peak             80.0     -.-    40.0    40.0     -.-     -.-    40.0

















































  Machine      MDG     QCD     TRFD    DYFESM  SPICE   MG3D    TRACK   Total
_______________________________________________________________________________
Cray XMP/14SE
  Base Sec       452.2    40.9    17.4    19.5     -.-     -.-    17.7    547.7
  Base MFLOPS      7.6     6.3    24.8    28.3     -.-     -.-     4.6     18.7
  Base % Peak      3.8     3.2    12.4    14.2     -.-     -.-     2.3      9.3
  Opt  Sec       452.2    40.9    17.4    19.5     -.-     -.-    17.7    547.5
  Opt  MFLOPS      7.6     6.3    24.8    28.3     -.-     -.-     4.6     18.7
  Opt  % Peak      3.8     3.2    12.4    14.2     -.-     -.-     2.3      9.3

Cray XMP/416
  Base Sec       256.7    25.7     9.6    13.4    11.9   522.5    12.7   1069.8
  Base MFLOPS     13.4    10.1    44.8    41.1     3.9    21.2     6.5     25.6
  Base % Peak      1.4     1.1     4.8     4.4     0.4     2.3     0.7      2.7
  Opt  Sec        17.5     3.2     2.1     2.9     3.2    24.4     3.3    114.0
  Opt  MFLOPS    195.9    81.4   206.2   191.7    14.7   453.3    24.7    240.0
  Opt  % Peak     20.8     8.7    21.9    20.4     1.6    48.2     2.6     25.5

Cray YMP/832
  Base Sec       207.2    20.7     7.6     9.3     8.2   407.9    10.3    812.0
  Base MFLOPS     16.6    12.6    56.4    59.4     5.7    27.1     7.9     33.7
  Base % Peak      0.6     0.5     2.1     2.2     0.2     1.0     0.3      1.3
  Opt  Sec         5.8     1.0     1.0     1.9     2.5     9.7     2.1     51.6
  Opt  MFLOPS    594.9   249.6   444.2   295.2    18.9  1146.2    38.7    530.2
  Opt  % Peak     22.3     9.4    16.7    11.1     0.7    43.0     1.5     19.9

Cray 2S/4128  
  Base Sec       244.6    32.8    16.7    17.3    12.1   569.2    16.0   1235.6
  Base MFLOPS     14.0     7.9    25.8    32.0     3.9    19.4     5.1     22.1
  Base % Peak      0.7     0.4     1.3     1.6     0.2     1.0     0.3      1.1
  Opt  Sec       120.6    16.5     8.3     7.4     7.0    74.0    15.0    467.2
  Opt  MFLOPS     28.5    15.8    52.2    74.3     6.7   149.5     5.4     58.6
  Opt  % Peak      1.5     0.8     2.7     3.8     0.3     7.7     0.3      3.0

Cyber 205    
  Base Sec       950.6    69.3    49.1    44.2    36.6  1312.2    39.9   3650.2
  Base MFLOPS      3.6     3.7     8.8    12.5     1.3     8.4     2.0      7.0
  Base % Peak      0.9     0.9     2.2     3.1     0.3     2.1     0.5      1.7
  Opt  Sec       950.6    69.3    49.1    44.2    36.6  1312.2    39.9   3421.9
  Opt  MFLOPS      3.6     3.7     8.8    12.5     1.3     8.4     2.0      7.5
  Opt  % Peak      0.9     0.9     2.2     3.1     0.3     2.1     0.5      1.9

ETA 10E     
  Base Sec         -.-     -.-     -.-    23.0     -.-     -.-     -.-    508.0
  Base MFLOPS      -.-     -.-     -.-    24.0     -.-     -.-     -.-     14.0
  Base % Peak      -.-     -.-     -.-     6.3     -.-     -.-     -.-      3.7
  Opt  Sec         -.-     -.-     -.-    12.9     -.-     -.-     -.-    408.9
  Opt  MFLOPS      -.-     -.-     -.-    42.9     -.-     -.-     -.-     17.4
  Opt  % Peak      -.-     -.-     -.-    11.3     -.-     -.-     -.-      4.6

ETA 10G    
  Base Sec       572.6    48.4    17.5    15.4    22.4  1700.8    22.1   2978.5
  Base MFLOPS      6.0     5.4    24.7    36.0     2.1     6.5     3.7      8.6
  Base % Peak      1.1     0.9     4.3     6.3     0.4     1.1     0.6      1.5
  Opt  Sec       572.6    48.4    17.5     8.5    22.4  1700.8    22.1   2911.9
  Opt  MFLOPS      6.0     5.4    24.7    64.6     2.1     6.5     3.7      8.8
  Opt  % Peak      1.1     0.9     4.3     6.3     0.4     1.1     0.6      1.5





ETA 10Q   
  Base Sec         -.-     -.-     -.-    41.7     -.-     -.-     -.-    900.0
  Base MFLOPS      -.-     -.-     -.-    13.3     -.-     -.-     -.-      7.9
  Base % Peak      -.-     -.-     -.-     6.3     -.-     -.-     -.-      3.8
  Opt  Sec         -.-     -.-     -.-    23.3     -.-     -.-     -.-    754.1
  Opt  MFLOPS      -.-     -.-     -.-    23.7     -.-     -.-     -.-      9.4
  Opt  % Peak      -.-     -.-     -.-    11.3     -.-     -.-     -.-      4.5

Fujitsu VP100 
  Base Sec       473.3    56.5    14.4    17.2    15.9   641.3    20.1   1675.1
  Base MFLOPS      7.3     4.6    29.9    32.1     2.9    17.2     4.1     13.8
  Base % Peak      2.6     1.6    10.5    11.2     1.0     6.0     1.4      4.8
  Opt  Sec       473.3    56.5    14.4    17.2    15.9   641.3    20.1   1675.1
  Opt  MFLOPS      7.3     4.6    29.9    32.1     2.9    17.2     4.1      4.8
  Opt  % Peak      2.6     1.6    10.5    11.2     1.0     6.0     1.4      4.8

HitachiS820/80
  Base Sec       225.9    28.0     -.-     7.9     8.2     -.-    11.0    365.9
  Base MFLOPS     15.2     9.3     -.-    69.8     5.7     -.-     7.4     27.4
  Base % Peak      0.5     0.3     -.-     2.3     0.2     -.-     0.2      0.9
  Opt  Sec       225.9    28.0     -.-     7.9     8.2     -.-    11.0    365.9
  Opt  MFLOPS     15.2     9.3     -.-    69.8     5.7     -.-     7.4     27.4
  Opt  % Peak      0.5     0.3     -.-     2.3     0.2     -.-     0.2      0.9

NEC SX/2     
  Base Sec       243.3    27.0     7.4     8.3    10.0   315.7    13.2    865.7
  Base MFLOPS     14.1     9.6    57.9    66.6     4.7    35.0     6.2     29.5
  Base % Peak      1.1     0.7     4.5     5.1     0.4     2.7     0.5      2.3
  Opt  Sec        24.8    27.0     7.4     8.0    10.0   315.7    13.2    611.0
  Opt  MFLOPS    138.6     9.6    57.9    69.3     4.7    35.0     6.2     41.8
  Opt  % Peak     10.7     0.7     4.5     5.3     0.4     2.7     0.5      3.2

INTEL iPSC/1 
  Base Sec     23049.9   353.8     -.-     -.-     -.-     -.-     -.-  23403.7
  Base MFLOPS      0.1     0.7     -.-     -.-     -.-     -.-     -.-      0.2
  Base % Peak      2.5    17.5     -.-     -.-     -.-     -.-     -.-      4.0
  Opt  Sec      2386.0   102.5     -.-     -.-     -.-     -.-     -.-   2488.5
  Opt  MFLOPS      1.4     2.5     -.-     -.-     -.-     -.-     -.-      1.5
  Opt  % Peak     35.0    62.5     -.-     -.-     -.-     -.-     -.-     37.2

Mark III     
  Base Sec     25520.0    66.8     -.-     -.-     -.-     -.-     -.-  25586.8
  Base MFLOPS      0.1     3.9     -.-     -.-     -.-     -.-     -.-      0.1
  Base % Peak      0.8    30.5     -.-     -.-     -.-     -.-     -.-      1.1
  Opt  Sec      1094.2    40.9     -.-     -.-     -.-     -.-     -.-   1135.1
  Opt  MFLOPS      3.1     6.3     -.-     -.-     -.-     -.-     -.-      3.3
  Opt  % Peak     24.2    49.2     -.-     -.-     -.-     -.-     -.-     25.5

NCUBE NCUBE/10
  Base Sec     40125.9    90.1     -.-     -.-     -.-     -.-     -.-  40216.0
  Base MFLOPS      0.1     2.9     -.-     -.-     -.-     -.-     -.-      0.1
  Base % Peak      0.1     1.4     -.-     -.-     -.-     -.-     -.-      0.0
  Opt  Sec       369.6     8.3     -.-     -.-     -.-     -.-     -.-    377.9
  Opt  MFLOPS      9.3    31.3     -.-     -.-     -.-     -.-     -.-      9.8
  Opt  % Peak      4.5    15.3     -.-     -.-     -.-     -.-     -.-      4.8





ALLIANT FX/8
  Base Sec      2972.1   356.6   298.2   237.7    97.3 11651.6   141.5  20679.0
  Base MFLOPS      1.2     0.7     1.4     2.3     0.5     0.9     0.6      1.2
  Base % Peak      1.3     0.7     1.5     2.4     0.5     1.0     0.6      1.3
  Opt  Sec       618.2   119.9   298.2    95.7    23.9 11651.6   118.1  16608.5
  Opt  MFLOPS      5.5     2.2     1.4     5.8     2.0     0.9     0.7      1.5
  Opt  % Peak      5.8     2.3     1.5     6.1     2.1     1.0     0.7      1.6

ALLIANT FX/80
  Base Sec      2118.6   238.1   264.1   199.1    67.7  8586.1    89.5  15441.4
  Base MFLOPS      1.6     1.1     1.6     2.8     0.7     1.3     0.9      1.8
  Base % Peak      0.8     0.6     0.8     1.5     0.4     0.7     0.5      0.9
  Opt  Sec       500.7    86.4   264.1    89.9    17.8  8586.1    84.2  12701.7
  Opt  MFLOPS      6.9     3.0     1.6     6.1     2.6     1.3     1.0      2.2
  Opt  % Peak      3.7     1.6     0.8     3.2     1.4     0.7     0.5      1.1

ARDENT Titan 2 
  Base Sec      4505.0   261.5   137.2   364.3     -.-     -.-     -.-   7265.6
  Base MFLOPS      0.8     1.0     3.1     1.5     -.-     -.-     -.-      1.2
  Base % Peak      5.0     6.3    19.4     9.4     -.-     -.-     -.-      7.3
  Opt  Sec      4505.0   261.5   137.2   364.3     -.-     -.-     -.-   7265.6
  Opt  MFLOPS      0.8     1.0     3.1     1.5     -.-     -.-     -.-      1.2
  Opt  % Peak      5.0     6.3    19.4     9.4     -.-     -.-     -.-      7.3

CONVEX C220   
  Base Sec      1357.7   136.5    77.6   185.5    31.5  4519.8    47.3   8484.8
  Base MFLOPS      2.5     1.9     5.6     3.0     1.5     2.4     1.7      3.0
  Base % Peak      2.5     1.9     5.6     3.0     1.5     2.4     1.7      3.0
  Opt  Sec      1357.7   136.5    77.6   185.5    31.5  4519.8    47.3   8484.8
  Opt  MFLOPS      2.5     1.9     5.6     3.0     1.5     2.4     1.7      3.0
  Opt  % Peak      2.5     1.9     5.6     3.0     1.5     2.4     1.7      3.0

STARDENT 3010 
  Base Sec       768.0    70.0    94.0    64.0    24.0  1235.7    32.0   3761.7
  Base MFLOPS      4.5     3.7     4.6     8.6     1.9     9.0     2.6      6.2
  Base % Peak      9.4     7.7     9.6    17.9     4.0    18.8     5.4     12.8
  Opt  Sec       768.0    70.0    94.0    64.0    24.0  1235.7    32.0   3761.7
  Opt  MFLOPS      4.5     3.7     4.6     8.6     1.9     9.0     2.6      6.2
  Opt  % Peak      9.4     7.7     9.6    17.9     4.0    18.8     5.4     12.8

DEC 600-410S  
  Base Sec      2554.0   184.0   309.0   160.0    61.0 10646.0    85.0  23220.0
  Base MFLOPS      1.3     1.4     1.4     3.5     0.8     1.0     1.0      1.2
  Base % Peak     40.0    43.1    43.1   107.7    24.6    30.8    30.8     36.2
  Opt  Sec      2554.0   184.0   309.0   160.0    61.0 10646.0    85.0  23220.0
  Opt  MFLOPS      1.3     1.4     1.4     3.5     0.8     1.0     1.0      1.2
  Opt  % Peak     40.0    43.1    43.1   107.7    24.6    30.8    30.8     36.2

ENCORE Multimax
  Base Sec     25615.9  2134.0  3426.5     -.-   464.5     -.-   681.4 121397.3
  Base MFLOPS      0.1     0.1     0.1     -.-     0.1     -.-     0.1      0.1
  Base % Peak     25.0    25.0    25.0     -.-    25.0     -.-    25.0     28.6
  Opt  Sec     25615.9  1761.2  3384.9     -.-   464.5     -.-   681.4 110908.0
  Opt  MFLOPS      0.1     0.1     0.1     -.-     0.1     -.-     0.1      0.1
  Opt  % Peak     25.0    25.0    25.0     -.-    25.0     -.-    25.0     31.3





VAX 11/780    
  Base Sec     20349.0  1124.0  2906.0  1100.0   423.0 65228.0   605.0 188464.0
  Base MFLOPS      0.2     0.2     0.1     0.5     0.1     0.2     0.1      0.1
  Base % Peak     16.0    16.0     8.0    40.0     8.0    16.0     8.0     11.6
  Opt  Sec     20349.0  1124.0  2906.0  1100.0   423.0 65228.0   605.0 188464.0
  Opt  MFLOPS      0.2     0.2     0.1     0.5     0.1     0.2     0.1      0.1
  Opt  % Peak     16.0    16.0     8.0    40.0     8.0    16.0     8.0     11.6

APOLLO DSP10040
  Base Sec       994.8    88.6   233.9   136.8     -.-  3555.9     -.-   6494.3
  Base MFLOPS      3.5     2.9     1.8     4.0     -.-     3.1     -.-      3.0
  Base % Peak      8.8     7.3     4.5    10.0     -.-     7.8     -.-      7.5
  Opt  Sec       994.8    88.6   233.9   136.8     -.-  3555.9     -.-   6494.3
  Opt  MFLOPS      3.5     2.9     1.8     4.0     -.-     3.1     -.-      3.0
  Opt  % Peak      8.8     7.3     4.5    10.0     -.-     7.8     -.-      7.5

DEC 3100      
  Base Sec      2059.4     -.-     -.-   195.6     -.-     -.-     -.-   5443.3
  Base MFLOPS      1.7     -.-     -.-     2.8     -.-     -.-     -.-      1.9
  Base % Peak     85.0     -.-     -.-   140.0     -.-     -.-     -.-     92.8
  Opt  Sec      2059.4     -.-     -.-   125.8     -.-     -.-     -.-   5054.3
  Opt  MFLOPS      1.7     -.-     -.-     4.4     -.-     -.-     -.-      2.0
  Opt  % Peak     85.0     -.-     -.-   220.0     -.-     -.-     -.-    100.0

MIPS M/120    
  Base Sec      1667.1   106.7   338.1   106.4    54.9     -.-     -.-   3437.8
  Base MFLOPS      2.1     2.4     1.3     5.2     0.9     -.-     -.-      1.9
  Base % Peak     25.3    28.9    15.7    62.7    10.8     -.-     -.-     23.4
  Opt  Sec      1667.1   106.7   338.1   106.4    54.9     -.-     -.-   3437.8
  Opt  MFLOPS      2.1     2.4     1.3     5.2     0.9     -.-     -.-      1.9
  Opt  % Peak     25.3    28.9    15.7    62.7    10.8     -.-     -.-     23.4

MIPS RS2030   
  Base Sec      1738.5   109.2   351.3   112.7    67.3  5167.9    21.8  11734.5
  Base MFLOPS      2.0     2.4     1.2     4.9     0.7     2.1     3.7      1.8
  Base % Peak     24.0    28.7    14.4    58.7     8.4    25.1     44.2    21.8
  Opt  Sec      1738.5   109.2   351.3   112.7    67.3  5167.9    21.8  11734.5
  Opt  MFLOPS      2.0     2.4     1.2     4.9     0.7     2.1     3.7      1.8
  Opt  % Peak     24.0    28.7    14.4    58.7     8.4    25.1     44.2    21.8

SUN SPARC 1   
  Base Sec      3243.0   138.3   353.7   224.3    66.5     -.-     -.-   9465.4
  Base MFLOPS      1.1     1.9     1.2     2.5     0.7     -.-     -.-      1.1
  Base % Peak     20.0    34.5    21.8    45.5    12.7     -.-     -.-     20.8
  Opt  Sec      3243.0   138.3   353.7   224.3    66.5     -.-     -.-   8178.1
  Opt  MFLOPS      1.1     1.9     1.2     2.5     0.7     -.-     -.-      1.3
  Opt  % Peak     20.0    34.5    21.8    45.5    12.7     -.-     -.-     24.1

SUN SPARC 330 
  Base Sec      2377.3   110.0   240.0   151.6    48.1     -.-     -.-   6756.9
  Base MFLOPS      1.4     2.4     1.8     3.6     1.0     -.-     -.-      1.6
  Base % Peak     20.0    34.3    25.7    51.4    14.3     -.-     -.-     22.9
  Opt  Sec      2377.3   110.0   240.0   141.9    48.1     -.-     -.-   5811.6
  Opt  MFLOPS      1.4     2.4     1.8     3.9     1.0     -.-     -.-      1.9
  Opt  % Peak     20.0    34.3    25.7    55.7    14.3     -.-     -.-     26.7





SUN 3/280    
  Base Sec     19155.0  1515.9  3907.8  2026.5   378.3     -.-     -.-  72344.8
  Base MFLOPS      0.2     0.2     0.1     0.3     0.1     -.-     -.-      0.1
  Base % Peak     80.0    80.0    40.0   120.0    40.0     -.-     -.-     60.0
  Opt  Sec     19155.0  1515.9  3907.8  1864.0   378.3     -.-     -.-  70632.0
  Opt  MFLOPS      0.2     0.2     0.1     0.3     0.1     -.-     -.-      0.2
  Opt  % Peak     80.0    80.0    40.0   120.0    40.0     -.-     -.-     61.4

ian@decwrl.dec.com (Ian L. Kaplan) (06/30/90)

>In a recent conversation with some colleagues of mine at the Ames NAS
>facility concerning parallel processing, they mentioned their experiences
>porting a code to the Intel i860 hybercube located there (128 nodes,
>7.5 gigaFLOPS peek).  On this particular code they were able to
>achieve about 300 MFLOPS for an efficiency factor of about 2.5%.  This
>low efficiency factor didn't seem to bother them but it sure bothered
>me.  Other colleagues of ours at the United Technologies Research Center
>in East Hartford, CT ported similar codes to their 1/4 CM-2 and achieved
>anywhere from 600 to 800 MFLOPS for an efficancy factor of more than 50%.
>
>David A. Remaklus
>NASA Lewis Research Center
>Cleveland, Ohio 44135
>xxremak@csduts1.lerc.nasa.gov

  This is somewhat tangential to the issue, but I could not resist
mentioning it.

  Perhaps the difference in the execution efficiency between the Intel
cube (an MIMD machine) and the CM-2 (a SIMD machine) is due to the
fact (not doubt hotly contested) that SIMD systems are easier to
program.  Easier to program also means easier to fit one's problem to.
MIMD architecture and programming continues to be a hot topic in the
computer science research community.  Some people theorize that this
is because MIMD programming is so difficult that it provides a
challenging research problem and a fertile field for Phd theses.

  A term like "ease of programming" is often used without giving much
definition, so I will try to flesh out my claims.  One definition of
ease of programming is that much of the machine architecture is
abstracted and the programmer can think about writing a program that
describes the problem rather than thinking about shoehorning the
problem onto the machine.  SIMD systems can be programmed in
_standard_ Fortran 90.  MIMD systems can only be programmed in a
language that contains extensions for synchronization.  The SIMD
programmer need only consider machine architecture when it comes to
making their program run more efficiently.  The MIMD programmer must
consider the machine architecture or the program will not run
deterministicly.  Full symbolic debugging can also be supported on a
SIMD machine.  Has anyone done a symbolic debugger for a large scale
MIMD system?

  Of course I am biased.

                       Ian Kaplan
		       MasPar Computer Corp.
		       ian@maspar.com

mccalpin@vax1.udel.edu (John D Mccalpin) (07/03/90)

In article <9508@hubcap.clemson.edu> xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) writes:
>In a recent conversation with some colleagues of mine at the Ames NAS
>facility concerning parallel processing, they mentioned their experiences
>porting a code to the Intel i860 hybercube located there (128 nodes,
>7.5 gigaFLOPS peek).  On this particular code they were able to
>achieve about 300 MFLOPS for an efficiency factor of about 2.5%.  This
>low efficiency factor didn't seem to bother them but it sure bothered
>me.

The question of efficiency is complicated in this case by the choice of
the i860 as the cpu.  The peak performance quoted corresponds to about
60 MFLOPS/cpu, which may not be attainable even for optimally coded
assembly language routines.  Preston Briggs at Rice University has
spent some time working on this processor and in a real live piece of
hardware was unable to obtain greater than about 33 MFLOPS for a
hand-coded 64-bit matrix-multiply kernel.  Code compiled from FORTRAN
using existing compiler technology typically produced performance in
the 2-5 MFLOPS range.

The 300 MFLOPS observed performance is about 2.3 MFLOPS/cpu, and may
indicate very good performance, all things considered.   So a more
reasonable estimate of efficiency for this case is to look at the 
parallel speedup.  I would be surprised if one cpu gave better than 
5 MFLOPS performance, so the "efficiency" in this case would be close
to 50% = (300 MFLOPS)/(128 cpu*5MFLOPS/cpu).
-- 
John D. McCalpin                               mccalpin@vax1.udel.edu
Assistant Professor                            mccalpin@delocn.udel.edu
College of Marine Studies, U. Del.             mccalpin@scri1.scri.fsu.edu

dmcmilla@cfctech.cfc.com (Don McMillan CS 50) (07/03/90)

In article <9508@hubcap.clemson.edu>, xxremak@csduts1.lerc.nasa.gov
(David A. Remaklus) writes:
|> 
|> It is our contention that it is necessary to achieve an efficiency factor
|> of at least 50% before the particular implementation of the code can be
|> considered appropriate for execution on that parallel processor system.
|> What are your opinions on this matter?  Are there any published papers
|> that deal with this subject?
|> 

You're in good company.  See "Speedup Versus Efficiency in Parallel
Systems", IEEE Trans on Computers, vol 38. no 3, March, 1989.

Basically, the authors define a method for determining the "average
parallelism" of a given algorithm, and thence to selecting the most 
appropriate number of processors such that at least 50% of the maximimum
possible speedup is attained, with at least 50% efficiency.

Don McMillan           __  .   .   Phone: (313) 986-1436
CS Department         /  ` |\ /|   UUCP: {umich,cfctech}!rphroy!rcsuna!dmcmilla
GM Research Labs      | ,_ | | |   CSNet: mcmillan@gmr.com
Warren, MI 48090 USA  \__/ |   |   Internet: dmcmilla%rcsuna.uucp@umich.edu

carroll@beaver.cs.washington.edu (Jeff Carroll) (07/05/90)

In article <9521@hubcap.clemson.edu> argosy!ian@decwrl.dec.com (Ian L. Kaplan) writes:
> (David Remaklus writes:)
>>In a recent conversation with some colleagues of mine at the Ames NAS
>>facility concerning parallel processing, they mentioned their experiences
>>porting a code to the Intel i860 hybercube located there (128 nodes,
>>7.5 gigaFLOPS peek).  On this particular code they were able to
>>achieve about 300 MFLOPS for an efficiency factor of about 2.5%.  This
>>low efficiency factor didn't seem to bother them but it sure bothered
>>me.  Other colleagues of ours at the United Technologies Research Center
>>in East Hartford, CT ported similar codes to their 1/4 CM-2 and achieved
>>anywhere from 600 to 800 MFLOPS for an efficancy factor of more than 50%.

	We have an application that runs at roughly 70% efficiency on
our iPSC/860. Email me for details.	


>  Perhaps the difference in the execution efficiency between the Intel
>cube (an MIMD machine) and the CM-2 (a SIMD machine) is due to the
>fact (not doubt hotly contested) that SIMD systems are easier to
>program.  Easier to program also means easier to fit one's problem to...

	I thought marketing was taboo on the net.  :^)

	In this case I think it's far more likely that the low
efficiencies are due to the fact that there are no good market-ready
compilers for the i860 as yet.

>  A term like "ease of programming" is often used without giving much
>definition, so I will try to flesh out my claims.  One definition of
>ease of programming is that much of the machine architecture is
>abstracted and the programmer can think about writing a program that
>describes the problem rather than thinking about shoehorning the
>problem onto the machine.

	Maybe. But for problems that defy application of a
data-parallel algorithm (and thus run at very ordinary speeds on a
data-parallel machine), one is quickly persuaded to learn to use a
shoehorn.  

>...  SIMD systems can be programmed in
>_standard_ Fortran 90.  MIMD systems can only be programmed in a
>language that contains extensions for synchronization.

	Well, no. An iPSC can be programmed in standard FORTRAN
77 (who uses FORTRAN 90 anyway?). Think of a network of Unix-like
systems supporting an RPC mechanism. That's the way you program an iPSC.
Once you've finished decomposing your problem, it's no more painful than
writing FORTRAN under VMS (groan...).

	What you say is true of some (if not all) other MIMD systems.

> ...The SIMD
>programmer need only consider machine architecture when it comes to
>making their program run more efficiently.  The MIMD programmer must
>consider the machine architecture or the program will not run
>deterministicly.

	Granted: but, in my opinion, the rewards are great.

>
>  Of course I am biased.
>
>                       Ian Kaplan
>		       MasPar Computer Corp.
>		       ian@maspar.com

	Let's hear it for truth in advertising.

	Jeff Carroll
	carroll@atc.boeing.com

disclaimer #1: I am not associated with Intel Corporation, except as a
satisfied customer.

disclaimer #2: these are personal opinions, not official positions of the
Boeing Company (though the two may coincide, especially regarding such
things as FORTRAN 90.)

xxremak@csduts1.lerc.nasa.gov (David A. Remaklus) (07/14/90)

In article <9661@hubcap.clemson.edu> dfk@grad13.cs.duke.edu (David F. Kotz) writes:
>What is difficult about programming many MIMD (or SIMD) systems is to
>make an *efficient* program (the original point of this thread of
>discussion).  Just getting your program to run is not always that
>hard, although of course some systems/languages are easier than others
>in this respect. To make an efficient program, unfortunately, one must
>usually pay attention to architectural details. 
>
After the original posting, I received a number of comments by email.  It
seems that we all have different definitions for efficiency as it applies
to parallel processing.  I think the core issue that needs to be
considered (which is related to efficiency) is the determination of the
"appropriateness" of executing a given application/algorithm for
a given architecture and parallel system.  For example, few (I would think)
would argue that it is inappropriate to run an entirely or predominately
scalar code on a CRAY supercomputer.

I mean it all gets down to whether or not the application/algorithm can
take advantage of the resources presented to it by the architecture/machine.
If it can't, then it doesn't belong there.

--
David A. Remaklus
NASA Lewis Research Center
Cleveland, Ohio 44135
xxremak@csduts1.lerc.nasa.gov