[comp.arch] Lawerence Livermore Loops in C

schw@gt-eedsp.UUCP (Dave Schwartz) (12/09/88)

If you have a copy of the Lawerence Livermore Loops set that has been
ported to C. I would appreciate if you would send me a copy.

Thanks,
David Schwartz

-------------------------------------------------------------
uucp:                         schw@gt-eedsp.uucp
domainizing internet mailers: schw@gteedsp.gatech.edu

yuval@taux02.UUCP (Gideon Yuval) (12/09/88)

In article <584@gt-eedsp.UUCP> schw@gt-eedsp.UUCP (Dave Schwartz) writes:
>
>If you have a copy of the Lawerence Livermore Loops set that has been
>ported to C. I would appreciate if you would send me a copy.

Me too, please.

-- 
Gideon Yuval, yuval@taux01.nsc.com, +972-2-690992 (home) ,-52-522255(work)
 Paper-mail: National Semiconductor, 6 Maskit St., Herzliyah, Israel
                                                TWX: 33691, fax: +972-52-558322

tdg@hall.cray.com (Terry Greyzck) (12/13/88)

In article <335@taux02.UUCP> yuval@taux02.UUCP (Gideon Yuval) writes:
>In article <584@gt-eedsp.UUCP> schw@gt-eedsp.UUCP (Dave Schwartz) writes:
>>
>>If you have a copy of the Lawerence Livermore Loops set that has been
>>ported to C. I would appreciate if you would send me a copy.
>
>Me too, please.
>
There seems to be some interest, so I'll post the 14-loop version here.
I am not aware of a C version of the 24-loop kernels, although someone
at the National Magnetic Fusion Energy Computing Center (NMFECC, or just
MFE) was working on one some time back.  If you're into serious
benchmarking, get the 24-loop version.  If you're just curious about
machine speed, the 14-loop version is probably okay, and it is much,
much, shorter.

The C version includes a checksum self-check, so you will know if you
obtained the correct answers.  Someone should retrofit this into the
Fortran version.

The only system dependency is the use of the clock() function to
measure CPU time used.  Set the variable 'scale' to indicate what
multiple of microseconds clock() returns on your system.

The opinions, et cetera, in this posting are my own and do not reflect
those of anyone else.  At least, not intentionally.

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  lloops.c
# Wrapped by tdg@zia on Mon Dec 12 10:22:25 1988
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'lloops.c' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'lloops.c'\"
else
echo shar: Extracting \"'lloops.c'\" \(13098 characters\)
sed "s/^X//" >'lloops.c' <<'END_OF_FILE'
X/*
X *     program analysis evaluates execution rates of c/fortran do-loops.
X *     through-put is measured in units of millions of floating-point
X *     operations per second, called mflops.
X */
X#define MAXERR  1.0e-6
X 
X#include <stdio.h>
X#include <math.h>
X 
Xdouble  x[1002], y[1002], z[1002], u[501], px[16][101], cx[16][101];
Xdouble  u1[6][23][3], u2[6][23][3], u3[6][23][3];
Xdouble  b[65][9], bnk1[6], c[65][9], bnk2[6], p[5][513], bnk3[6];
Xdouble  h[65][9], bnk4[6], bnk5[6], ex[68];
Xdouble  rh[68], dex[68], vx[151], xx[151], grd[151];
Xint     e[193], f[193];
X 
Xlong    nrops[] = {  0,  5, 10,  2,  2,
X                         2,  2, 16, 36,
X                        17,  9,  1,  1,
X                         7, 11 };
X 
Xlong    loops[] = {   0, 400, 200,1000, 510,
X                        1000,1000, 120,  40,
X                         100, 100,1000,1000,
X                         128, 150 };
X 
Xdouble  checks[] = {  0,  0.811986948148e+07,  0.356310000000e+03,
X                          0.356310000000e+03, -0.402412007078e+05,
X                          0.136579037764e+06,  0.419716278716e+06,
X                          0.429449847526e+07,  0.314064400000e+06,
X                          0.182709000000e+07, -0.140415250000e+09,
X                          0.374895020500e+09,  0.000000000000e+00,
X                          0.171449024000e+06, -0.510829560800e+07 };
X 
Xmain()
X{
X        extern void     init();
X        extern long     clock();
X 
X        register int    nt, lw, nl1, nl2;
X        register int    i, i1, i2, ip, ir, ix, j, j1, j2, k, kx, ky, l, m;
X        double  ts[21], rt[21], rpm[21], cksum[21];
X        double  r, t, a11, a12, a13, sig, a21, a22, a23, a31, a32, a33;
X        double  bm28, bm27, bm26, bm25, bm24, bm23, bm22, c0, flx, rx1;
X        register double q, s, scale, uu, du1, du2, du3, ar, br, cr, xi, ri;
X        long    mops[20];
X
X        for (i=1; i<=20; i++)
X                cksum[i] = 0.0;
X        r = 4.86;        t = 276.0;        a11 = 0.5;
X        a12 = 0.33;      a13 = 0.25;       sig = 0.8;
X        a21 = 0.20;      a22 = 0.167;      a23 = 0.141;
X        a31 = 0.125;     a32 = 0.111;      a33 = 0.10;
X        bm28 = 0.1;      bm27 = 0.2;       bm26 = 0.3;
X        bm25 = 0.4;      bm24 = 0.5;       bm23 = 0.6;
X        bm22 = 0.7;      c0 = 0.8;         flx = 4.689;
X        rx1 = 64.0;
X/*
X *     end of initialization -- begin timing
X */
X 
X/* loop 1      hydro excerpt */
X 
X        init();
X        ts[1] = (double) clock();
X        q = 0.0;
X        for (k=1; k<=400; k++)
X                x[k] = q+y[k]*(r*z[k+10]+t*z[k+11]);
X        ts[1] = (double) clock() - ts[1];
X        for (k=1; k<=400; k++)
X                cksum[1] += (double)k * x[k];
X 
X/* loop 2      mlr, inner product */
X 
X        init();
X        ts[2] = (double) clock();
X        q = 0.0;
X        for (k=1; k<=996; k+=5)
X                q += z[k]*x[k]+z[k+1]*x[k+1]+z[k+2]*x[k+2]+
X                        z[k+3]*x[k+3]+z[k+4]*x[k+4];
X        ts[2] = (double) clock() - ts[2];
X        cksum[2] = q;
X 
X/* loop 3      inner prod */
X 
X        init();
X        ts[3] = (double) clock();
X        q = 0.0;
X        for (k=1; k<=1000; k++)
X                q += z[k]*x[k];
X        ts[3] = (double) clock() - ts[3];
X        cksum[3] = q;
X 
X/* loop 4      banded linear equarions */
X 
X        init();
X        ts[4] = (double) clock();
X        for (l=7; l<=107; l+=50) {
X                lw=l;
X                for (j=30; j<=870; j+=5)
X                        x[l-1] -= x[lw++]*y[j];
X                x[l-1] = y[5]*x[l-1];
X                }
X        ts[4] = (double) clock() - ts[4];
X        for (l=7; l<=107; l+=50)
X                cksum[4] += (double) l * x[l-1];
X 
X/* loop 5      tri-diagonal elimination, below diagonal */
X 
X        init();
X        ts[5] = (double) clock();
X        for (i=2; i<=998; i+=3)  {
X                x[i] = z[i]*(y[i]-x[i-1]);
X                x[i+1] = z[i+1]*(y[i+1]-x[i]);
X                x[i+2] = z[i+2]*(y[i+2]-x[i+1]);
X                }
X        ts[5] = (double) clock() - ts[5];
X        for (i=2; i<=1000; i++)
X                cksum[5] += (double) i * x[i];
X 
X/* loop 6      tri-diagonal elimination, above diagonal */
X 
X        init();
X        ts[6] = (double) clock();
X        for (j=3; j<=999; j+=3) {
X                i = 1003-j;
X                x[i] = x[i]-z[i]*x[i+1];
X                x[i-1] = x[i-1]-z[i-1]*x[i];
X                x[i-2] = x[i-2]-z[i-2]*x[i-1];
X                }
X        ts[6] = (double) clock() - ts[6];
X        for (j=1; j<=999; j++)  {
X                l = 1001-j;
X                cksum[6] += (double) j * x[l];
X                }
X 
X/* loop 7      equation of state excerpt */
X 
X        init();
X        ts[7] = (double) clock();
X        for (m=1; m<=120; m++)
X                x[m] = u[m]+r*(z[m]+r*y[m])+
X                       t*(u[m+3]+r*(u[m+2]+r*u[m+1])+
X                       t*(u[m+6]+r*(u[m+5]+r*u[m+4])));
X        ts[7] = (double) clock() - ts[7];
X        for (m=1; m<=120; m++)
X                cksum[7] += (double) m * x[m];
X 
X/* loop 8      p.d.e. integration */
X 
X        init();
X        ts[8] = (double) clock();
X        nl1 = 1;        nl2 = 2;
X        for (kx=2; kx<=3; kx++)  {
X                for (ky=2; ky<=21; ky++) {
X                        du1 = u1[kx][ky+1][nl1]-u1[kx][ky-1][nl1];
X                        du2 = u2[kx][ky+1][nl1]-u2[kx][ky-1][nl1];
X                        du3 = u3[kx][ky+1][nl1]-u3[kx][ky-1][nl1];
X                        u1[kx][ky][nl2] = u1[kx][ky][nl1]+a11*du1+
X                                a12*du2+a13*du3+sig*(u1[kx+1][ky][nl1]
X                                -2.0*u1[kx][ky][nl1]+u1[kx-1][ky][nl1]);
X                        u2[kx][ky][nl2] = u2[kx][ky][nl1]+a21*du1+
X                                a22*du2+a23*du3+sig*(u2[kx+1][ky][nl1]
X                                -2.0*u2[kx][ky][nl1]+u2[kx-1][ky][nl1]);
X                        u3[kx][ky][nl2] = u3[kx][ky][nl1]+a31*du1+
X                                a32*du2+a33*du3+sig*(u3[kx+1][ky][nl1]
X                                -2.0*u3[kx][ky][nl1]+u3[kx-1][ky][nl1]);
X                        }
X                }
X        ts[8] = (double) clock() - ts[8];
X        for (i=1; i<=2; i++)
X                for (kx=2; kx<=3; kx++)
X                        for (ky=2; ky<=21; ky++)
X                                cksum[8] += (double) kx * (double) ky *
X                                        (double) i * (u1[kx][ky][i]+
X                                        u2[kx][ky][i]+u3[kx][ky][i]);
X 
X/* loop 9      integrate predictors */
X 
X        init();
X        ts[9] = (double) clock();
X        for (i=1; i<=100; i++)
X                px[1][i] = bm28*px[13][i] + bm27*px[12][i] +
X                        bm26*px[11][i] + bm25*px[10][i] + bm24*px[9][i] +
X                        bm23*px[8][i] + bm22*px[7][i] +
X                        c0*(px[5][i] + px[6][i]) + px[3][i];
X        ts[9] = (double) clock() - ts[9];
X        for (i=1; i<=100; i++)
X                cksum[9] += (double) i * px[1][i];
X 
X/* loop 10     difference predictors */
X 
X        init();
X        ts[10] = (double) clock();
X        for (i=1; i<=100; i++)   {
X                ar = cx[5][i];
X                br = ar-px[5][i];
X                px[5][i] = ar;
X                cr = br-px[6][i];
X                px[6][i] = br;
X                ar = cr-px[7][i];
X                px[7][i] = cr;
X                br = ar-px[8][i];
X                px[8][i] = ar;
X                cr = br-px[9][i];
X                px[9][i] = br;
X                ar = cr-px[10][i];
X                px[10][i] = cr;
X                br = ar-px[11][i];
X                px[11][i] = ar;
X                cr = br-px[12][i];
X                px[12][i] = br;
X                px[14][i] = cr-px[13][i];
X                px[13][i] = cr;
X                }
X        ts[10] = (double) clock() - ts[10];
X        for (i=1; i<=100; i++)
X                for (k=5; k<=14; k++)
X                        cksum[10] += (double) k * (double) i * px[k][i];
X 
X/* loop 11     first sum. */
X 
X        init();
X        ts[11] = (double) clock();
X        x[1] = y[1];
X        for (k=2; k<=1000; k++)
X                x[k] = x[k-1]+y[k];
X        ts[11] = (double) clock() - ts[11];
X        for (k=1; k<=1000; k++)
X                cksum[11] += (double) k * x[k];
X 
X/* loop 12     first diff. */
X 
X        init();
X        ts[12] = (double) clock();
X        for (k=1; k<=999; k++)
X                x[k] = y[k+1]-y[k];
X        ts[12] = (double) clock() - ts[12];
X        for (k=1; k<=999; k++)
X                cksum[12] += (double) k * x[k];
X 
X 
X/* loop 13      2-d particle pusher */
X 
X        init();
X        ts[13] = (double) clock();
X        for (ip=1; ip<=128; ip++)        {
X                i1 = p[1][ip];  j1 = p[2][ip];
X                p[3][ip] += b[i1][j1];
X                p[4][ip] += c[i1][j1];
X                p[1][ip] += p[3][ip];
X                p[2][ip] += p[4][ip];
X                i2 = (int) p[1][ip];
X                j2 = (int) p[2][ip];
X                p[1][ip] += y[i2+32];
X                p[2][ip] += z[j2+32];
X                i2 += e[i2+32];         j2 += f[j2+32];
X                h[i2][j2] += 1.0;
X                }
X        ts[13] = (double) clock() - ts[13];
X        for (ip=1; ip<=128; ip++)
X                cksum[13] += (double) ip * (p[3][ip]+p[4][ip]+p[1][ip]+
X                        p[2][ip]);
X        for (k=1; k<=64; k++)
X                for (ix=1; ix<=8; ix++)
X                        cksum[13] += (double) k * (double) ix * h[k][ix];
X 
X/* loop 14      1-d particle pusher */
X 
X        init();
X        ts[14] = (double) clock();
X        for (k=1; k<=150; k++)   {
X                ix = (int) grd[k];
X                xi = (double) ix;
X                vx[k] += ex[ix]+(xx[k]-xi)*dex[ix];
X                xx[k] += vx[k]+flx;
X                ir = (int) xx[k];
X                ri = (double) ir;
X                rx1 = xx[k]-ri;
X                ir = abs(ir % 64);
X                xx[k] = ri+rx1;
X                rh[ir] += 1.0-rx1;
X                rh[ir+1] += rx1;
X                }
X        ts[14] = (double) clock() - ts[14];
X        for (k=1; k<=150; k++)
X                cksum[14] += (double) k * (vx[k]+xx[k]);
X        for (k=1; k<=67; k++)
X                cksum[14] += (double) k * rh[k];
X 
X/* time the clock call */
X 
X        ts[15] = (double) clock();
X        ts[15] = (double) clock() - ts[15];
X 
X/* scale= set to convert time to micro-seconds */
X 
X        scale=1.0;
X        rt[15] = ts[15]*scale;
X        printf("clock overhead = %9.2f usec\n",rt[15]);
X 
X        nt = 14.0;
X        t = s = uu = 0.0;
X        for (k=1; k<=nt; k++)    {
X                rt[k] = (ts[k]-ts[15])*scale;
X                t += rt[k];
X                mops[k] = nrops[k]*loops[k];
X                s += (double) mops[k];
X                rpm[k] = 0.0;
X                if (rt[k] != 0.0)
X                        rpm[k] = (double) mops[k]/rt[k];
X                uu += rpm[k];
X                }
X        uu /= (double) nt;
X        s /= t;
X        printf("\nloop         checksum       flops    time    mflops \n");
X        for (k=1; k<=nt; k++)   {
X                printf("%4d  %20.12e%7d%9.1f%9.3f\n",k,cksum[k],mops[k],
X                        rt[k],rpm[k]);
X                if (fabs(cksum[k]-checks[k]) > fabs(checks[k]*MAXERR))
X                        printf("      %20.12e *** expected checksum\n",
X                                checks[k]);
X                }
X        printf("\n   average mflops=%9.3f\n",uu);
X}
Xvoid
Xinit()
X{
X        register int    j, k, l;
X
X        for (k=1; k<=1000; k++)  {
X                x[k] = 1.11;
X                y[k] = 1.123;
X                z[k] = 0.321;
X                }
X
X	for (k=1; k<=500; k++)
X		u[k] = 0.00025;
X 
X        for (k=1; k<=15; k++)    {
X                for (l=1; l<=100; l++)   {
X                        px[k][l] = l;
X                        cx[k][l] = l;
X                        }
X                }
X 
X        for (j=1; j<6; j++)     {
X                for (k=1; k<23; k++)    {
X                        for (l=1; l<3; l++)     {
X                                u1[j][k][l]=k;
X                                u2[j][k][l]=k+k;
X                                u3[j][k][l]=k+k+k;
X                                }
X                        }
X                }
X 
X        for (j=1; j<65; j++)    {
X                for (k=1; k<9; k++)     {
X                        b[j][k] = 1.00025;
X                        c[j][k] = 1.00025;
X                        h[j][k] = 1.00025;
X                        }
X                }
X 
X        for (j=1; j<6; j++)     {
X                bnk1[j] = j*100;
X                bnk2[j] = j*110;
X                bnk3[j] = j*120;
X                bnk4[j] = j*130;
X                bnk5[j] = j*140;
X                }
X 
X        for (j=1; j<5; j++)     {
X                for (k=1; k<513; k++)   {
X                        p[j][k] = 1.00025;
X                        }
X                }
X 
X        for (j=1; j<193; j++)
X                e[j] = f[j] = 1;
X 
X        for (j=1; j<68; j++)    {
X                ex[j] = rh[j] = dex[j] = (double) j;
X                }
X 
X        for (j=1; j<151; j++)   {
X                vx[j] = 0.001;
X                xx[j] = 0.001;
X                grd[j] = (double) (j/8+3);
X                }
X}
END_OF_FILE
if test 13098 -ne `wc -c <'lloops.c'`; then
    echo shar: \"'lloops.c'\" unpacked with wrong size!
fi
# end of 'lloops.c'
fi
echo shar: End of shell archive.
exit 0

ckim@esunix.UUCP (Cheol Kim) (12/13/88)

In article <584@gt-eedsp.UUCP>, schw@gt-eedsp.UUCP (Dave Schwartz) writes:
> 
> If you have a copy of the Lawerence Livermore Loops set that has been
> ported to C. I would appreciate if you would send me a copy.
> 
> Thanks,
> David Schwartz
> 
> -------------------------------------------------------------
> uucp:                         schw@gt-eedsp.uucp
> domainizing internet mailers: schw@gteedsp.gatech.edu


I would like to get a copy myself. Thanks in advance.

cheol kim
evans & sutherland
580 Arapeen Dr
SLC UT 84108	(801) 582-5847 (ext 3628)

ath@helios.prosys.se (Anders Thulin) (12/15/88)

In article <12074@hall.cray.com> tdg@hall.UUCP (Terry Greyzck) writes:
>
>There seems to be some interest, so I'll post the 14-loop [Laurence
>Livermore Loops] version here.
>

Just a comment ... At the end of the main program there's a variable
's' which is computed, but does not appear to be used. As it isn't inside
any of the loops, removing it doesn't change the outcome of the benchmark.
But it would be interesting to know what it is.
-- 
Anders Thulin			INET : ath@prosys.se
ProgramSystem AB		UUCP : ...!{uunet,mcvax}!enea!prosys!ath
Teknikringen 2A			PHONE: +46 (0)13 21 40 40
S-583 30 Linkoping, Sweden	FAX  :

tdg@hall.cray.com (Terry Greyzck) (12/20/88)

In article <321@helios.prosys.se> ath@helios.prosys.se (Anders Thulin) writes:
>In article <12074@hall.cray.com> tdg@hall.UUCP (Terry Greyzck) writes:
>>There seems to be some interest, so I'll post the 14-loop [Laurence
>>Livermore Loops] version here.
>
>Just a comment ... At the end of the main program there's a variable
>'s' which is computed, but does not appear to be used. As it isn't inside
>any of the loops, removing it doesn't change the outcome of the benchmark.
>But it would be interesting to know what it is.

Good observation, although I don't have a clue what it is.  Examining the
Fortran version of the code shows the same variable ('s') calculated in
the same manner, but it isn't used there, either.  Hmm.  I expect the C
version has it simply because the Fortran version has it... as it is not
part of any timed section of code, it can be deleted.

Terry Greyzck