[comp.sys.m68k] Nextstation

holmer@ucbarpa.Berkeley.EDU (Bruce K. Holmer) (01/18/91)

[]

I copied a number crunching application of mine from an '030 Next cube
to my new '040 Nextstation, and was shocked with the relatively poor
performance.  After some experimentation, I've found the cause---the
'040 floating point does not support in hardware the entire 68882
instruction set, so the unimplemented instructions must be done by
software.  That's fine with me, since Motorola assured us (IEEE Micro,
February 1990, p. 77) that:

	A software emulator of all unimplemented instruction si available
	from Motorola....  Execution time of the software emulation for
	elementary functions including all trap overhead (running on a
	25 MHz 68040) is 13 to 130 percent faster than the equivalent
	instructions on the 68882 running at 25 MHz.

However, whatever turned up in the Nextstation is certainly not what
Motorola was promising (whether the blame is Motorola's or Next's I
don't know).  For your amusement here is a small assembly language
program:
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%% tmp.s %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
#NO_APP
gcc_compiled.:
.text
LC0:
	.even
.globl _main
_main:
	link a6,#0
	clrl d1
L61:
	fmovex #0r0.5,fp0
	fQQQx fp0,fp0

	addql #1,d1
	cmpl #999999,d1
	jle L61
	unlk a6
	rts
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

Here are the timings (using /bin/time) for different QQQ's:

Next cube (25 MHz '030)
-----------------------

QQQ		user (sec.)	sys (sec.)

<none>		 1.8		 0.0		(instruction removed)
sqrt		 4.5		 0.1		(square root)
cos		13.9		 0.4		(cosine)
etox		19.8		 0.7		(e to the x power)
int		 2.8		 0.0		(integer part)
intrz		 2.8		 0.1		(integer part/round to zero)


Nextstation (25 MHz '040)
-------------------------

QQQ		user (sec.)	sys (sec.)

<none>		 0.4		 0.0		(instruction removed)
sqrt		 4.4		 0.0		(square root)
cos		 0.8		27.6		(cosine)
etox		 0.9		27.0		(e to the x power)
int		 0.9		82.9		(integer part)
intrz		 1.0		81.9		(integer part/round to zero)




Note that sqrt is implemented in hardware on the '040 (I threw it in
for a reality check).  Also, I ran each once, so I didn't average out
the variations in the timing.

However, the numbers do make the point that the emulation is done at
great expense on the Nextstation.  The real shock is the integer to
float conversion (2000 cycles!).  That's the one that hurt my application.

I do not know if the Nextstep 2.0 C compiler still uses fetoxx,
fintrzx, etc.  (it may use subroutine calls to faster emulation
software), but my alarm is still valid for programs that are copied
over as binaries or Sun executables that are converted using atom.

Can someone clarify this situation?  Will it be fixed soon?

--Bruce Holmer
holmer@ucbarpa.berkeley.edu