[comp.sys.next] ARRRGG!!! My DSP *still* seems slow...

zazula@uazhe0.physics.arizona.edu (RALPH ZAZULA) (04/03/91)

I posted a short while back about the "slowness" of the DSPAPmtm function.
I received a reply (I forget from who, sorry) that mentioned that, when
you call the DSPAPxxx routines, the DSP code gets loaded *every* time, even
if it is the same call over-and-over again.  I looked around and found
the program: /NextDeveloper/Examples/DSP/ArrayProcessing/myAP  and it
seemed that it had exactly what I wanted.  That is, load the DSP code once
and call it repeatedly.  So, I modified .../DSP/ArrayProcessing/matrix/matrix.c
to use the myAP code (ie. replaced myAPvasl with myAPmtm and got rid of
myAPvnot) and ran it again.  The results for 5000 iterations of DSP vs. 030
are:

  [27](bonehead)/users/zazula/Apps/myAP> time mytest
  DSP time: 59
  030 time: 1
  expected:
   0.040000 -0.020000 
   0.060000  0.160000 
  received:
   0.040000 -0.020000 
   0.060000  0.160000 

  mtm succeeded
  10.1u 15.9s 1:01 42% 0+0k 4+7io 0pf+0w


I used time(3) to get the times so resolution was only seconds.  Monitor
tells me that the DSP portion of the code runs mostly in system mode (if
that means anything...).  

Does anyone have an idea what is going on here?  Is it just the the
myAP code is inefficient too?  Is there a right way to load DSP code
and then call it repeatedly????

I'll include the modified matrix.c (really mytest.c) code at the end to 
answer any questions about exactly how the above numbers were arrived at.  
Also, I didn't use the -O option when compiling mytest.c (I had at first
thought that was why the 030 code ran faster - that maybe the test loop 
was optimized down to one iteration).

Thanks

Ralph Zazula

   |----------------------------------------------------------------------|
   | Ralph Zazula                               "Computer Addict!"        |
   | University of Arizona                 ---  Department of Physics     |
   |   UAZHEP::ZAZULA                            (DecNet/HEPNet)          |
   |   zazula@uazhe0.physics.arizona.edu         (Internet)               |
   |----------------------------------------------------------------------|
   |   "You can twist perceptions, reality won't budge."  - Neil Peart    |
   |----------------------------------------------------------------------|


   ------------------ mytest.c (matrix.c using myAP) ----------------- 
	
/*
 * matrix.c
 *	Test the matrix multiplication array processing DSP macro.
 *	This code downloads two matricies to the DSP then calls the
 *	C function DSPAPmtm().  This function contains the wrapped DSP array
 *	processing macro mtm, and causes the DSP code to be downloaded
 *	and executed.  The resultant matrix is read back and verified
 *	against the correct result.
 */

//#include <dsp/arrayproc.h>	/* include the array processing header */
#include "myAP.h"
#include "myAPmtm.h"

#include <math.h>		/* needed only for fabs() */
#include <sys/time.h>
#include <sys/types.h>

#ifndef TRUE
#define	TRUE	1
#endif
#ifndef FALSE
#define	FALSE	0
#endif

#define	B_ROWS	3		/* number of rows in b */
#define	B_COLS	2	  /* number of columns in b */
#define	A_ROWS	2   /* number of rows in a */
#define	A_COLS	B_ROWS		/* number of columns in a, must == B_ROWS */
#define	C_ROWS	A_ROWS		/* number of rows in c */
#define	C_COLS	A_ROWS		/* number of columns in c */

#define	A_SIZE	(A_ROWS * A_COLS)	/* number of elements in a */
#define	B_SIZE	(B_ROWS * B_COLS)	/* number of elements in b */
#define	C_SIZE	(C_ROWS * C_COLS)	/* number of elements in c */

#define	A_IN	myAPGetLowestAddress()	/* a address */
#define	B_IN	(A_IN + A_SIZE)		/* b address */
#define	C_OUT	(B_IN + B_SIZE)		/* c address */

/* Compare two floats to within 6 significant digits */
#define feq(a,b)		(fabs((a)-(b))<.000001)

main()
{
    float	a[A_ROWS][A_COLS];	/* input matrix a */
    float	b[B_ROWS][B_COLS];	/* input matrix b */
    float	c[C_ROWS][C_COLS];	/* output matrix c */
    float	d[C_ROWS][C_COLS];	/* correct result matrix d */
    int		failed = FALSE;
    int		i, j, k, l;
    long start,end;

    /* load some values into input arrays */
    a[0][0] =  0.1;
    a[0][1] =  0.2;
    a[0][2] = -0.1;
    a[1][0] =  0.3;
    a[1][1] =  0.1;
    a[1][2] =  0.4;
    b[0][0] = -0.2;
    b[0][1] =  0.5;
    b[1][0] =  0.4;
    b[1][1] = -0.3;
    b[2][0] =  0.2;
    b[2][1] =  0.1;

    DSPSetErrorFP(stderr);
    DSPEnableErrorLog();
    myAPInit();		/* initialize the DSP for array processing */

    /* put input arrays to the DSP */
    DSPWriteFloatArray((float *)a, DSP_MS_X, A_IN, 1, A_SIZE);
    DSPWriteFloatArray((float *)b, DSP_MS_X, B_IN, 1, B_SIZE);

    /* call the C interface function */
	// load the DSP program
    myAPmtm(A_IN, B_IN, C_OUT, B_ROWS, B_COLS, A_ROWS);

	 start = time(0);
	 for(l=0; l<5000; l++){
//	 printf("%u\n",l);
	 myAPGo();
    if(myAPAwaitNotBusy(100)) {
	fprintf(stderr,"AP program is hung!\n");
	exit(1);
    }
	 
    }
	 printf("DSP time: %u\n",time(0) - start);
	 
    /* get output array from the DSP */
    DSPReadFloatArray((float *)c, DSP_MS_X, C_OUT, 1, C_SIZE);
	
    myAPFree();	/* free the DSP */

    /* compute correct result into d */
	 start = time(0);
	 for(l=0; l<5000; l++){
//	 printf("%u\n",l);
    for (i = 0; i < A_ROWS; i++)
    	for (j = 0; j < B_COLS; j++) {
	    d[i][j] = 0;
	    for (k = 0; k < A_COLS; k++)
	    	d[i][j] += a[i][k] * b[k][j];
	 }
	}
	printf("030 time: %u\n",time(0) - start);
	
    /* display and compare computed result with DSP result */
    printf("expected:\n");
    for (i = 0; i < C_ROWS; i++) {
    	for (j = 0; j < C_COLS; j++) {
	    printf("%9f ", d[i][j]);
	    if (!feq(c[i][j], d[i][j]))
	    	failed = TRUE;
	    }
	printf("\n");
    }
    printf("received:\n");
    for (i = 0; i < C_ROWS; i++) {
    	for (j = 0; j < C_COLS; j++)
	    printf("%9f ", c[i][j]);
	printf("\n");
    }
    if (failed)
    	printf("\n*** mtm FAILED! ***\n");
    else
    	printf("\nmtm succeeded\n");
}