[comp.sys.next] 68040 Math: Emulation or library call or inline code ?

toon@news.sara.nl (02/18/91)

>In article <1991Feb17.113656.5876@kithrup.COM>,
>	sef@kithrup.COM (Sean Eric Fagan) writes:
>> In article <1991Feb15.135340.2808@news.sara.nl> toon@news.sara.nl writes:
toon>Sorts of. Is trapping these instructions really faster than implementing
toon>sin(), cos(), tan() and friends as run time library subroutines to the
toon>(Objective-) C(++) compiler ? I would doubt it.
sean> 
sean> Very good!  Perhaps you noticed the comment in this thread, a while ago,
sean> about 68040 fp-intensive programs being faster *when recompiled*?
sean>
You're right, I had seen that one then, but it was not clear how this
was accomplished. I suppose a lot of DPS code makes heavy use of these
transcendentals, so optimizing them IS important.
 
sean> Or aren't you aware that there is no easy way to replace a single
sean> instruction, e.g., fsin?
Well, that depends on how this is implemented. I can imagine that in the
old ('030) case, sin(), cos(), tan() and friends translated to either a
single instruction library call, or a single instruction expanded inline
(although I don't know enough of the (gcc) compiler to know whether it
is _possible_ to translate library calls to inline code). If this _is_
possible you have the choices (for the '040) to either translate to a
library call (with the library routine containing the 'emulation' code)
or to expand the 'emulation' code inline. Note that the reason for this
- although it amounts to more code - is that it is faster.
Either of the two possibilities is faster than heaving the instructions
emulated by traps, because in that case you also have to execute the
trap code itself.
-- 

Toon Moene, SARA - Amsterdam (NL)
Internet: TOON@SARA.NL

/usr/lib/sendmail.cf: Do.:%@!=/

scott@erick.gac.edu (Scott Hess) (02/18/91)

In article <1991Feb18.105109.2810@news.sara.nl> toon@news.sara.nl writes:
   Well, that depends on how this is implemented. I can imagine that in the
   old ('030) case, sin(), cos(), tan() and friends translated to either a
   single instruction library call, or a single instruction expanded inline
   (although I don't know enough of the (gcc) compiler to know whether it
   is _possible_ to translate library calls to inline code). If this _is_
   possible you have the choices (for the '040) to either translate to a
   library call (with the library routine containing the 'emulation' code)
   or to expand the 'emulation' code inline. Note that the reason for this
   - although it amounts to more code - is that it is faster.

   Either of the two possibilities is faster than heaving the instructions
   emulated by traps, because in that case you also have to execute the
   trap code itself.

I'm fairly sure that real fsin/cos instructions were generated by the
compiler - no need for the library routines when using 68882, as the
coprocessor was smart enough for it.

The main problems with traps isn't so much the trap code as the
requirement to save everything in sight so you can continue where
you left off when you get back.  Well, the trap code isn't wanted,
either, of course :-).  I think that the idea of using library calls
is good.  That's one of the ideas of stdio - without doing system
calls, you cut down on the number of traps, and the code executes
faster when it's entirely in user-mode, rather than having to switch
to supervisor so often.

If worst comes to worst, the '030 version will be slowed down.  But,
I submit the fact that within a couple months, there will more than
likely be more '040 machines out there than '030 machines.  Also,
sad but true, the '030 is the architecture of the past for NeXTs.
I mean, I still use '030 machines for almost all of my work, because
there aren't enough here yet to make a difference, but I'm more concerned
that my code be faster on the machines that I _will_ be running on
than about the machines I _was_ running on.

Besides, I suspect this is all currently academic.  From what I've heard,
the current trap stuff is sort of slow, anyhow, at least compared to
what it could be.  Sigh.

Later,
--
scott hess                      scott@gac.edu
Independent NeXT Developer	GAC Undergrad
<I still speak for nobody>
"Tried anarchy, once.  Found it had too many constraints . . ."
"Buy `Sweat 'n wit '2 Live Crew'`, a new weight loss program by
Richard Simmons . . ."

sef@kithrup.COM (Sean Eric Fagan) (02/19/91)

In article <1991Feb18.105109.2810@news.sara.nl> toon@news.sara.nl writes:
>Well, that depends on how this is implemented. I can imagine that in the
>old ('030) case, sin(), cos(), tan() and friends translated to either a
>single instruction library call, or a single instruction expanded inline
>(although I don't know enough of the (gcc) compiler to know whether it
>is _possible_ to translate library calls to inline code).

/* file <math.h> */

#ifdef __GNUC__
#	ifdef	__OPTIMIZE__
static __inline double
sin(double x) {
	double temp;
	__asm ("fsin %0, %1" : "=F" (temp) : "f" (x));
	return temp;
}
/* ... */
#endif __GNUC__

That's one way of doing it; I don't know the port of the 68k well enough to
be terribly exact.  When you do, in your code,

	foo = sin(bar);

the code generated looks roughly like

	fsin foo, bar

Or somesuch (probably moves both bar to an fp register, and uses one as the
destination).

THERE IS NO EASY WAY TO REPLACE THAT fsin WITHOUT RECOMPILING.  IF RECOMPILE
AND USE AN OPTIMIZED SUBROUTINE, THAT DOES *NOT* USE THE fsin INSTRUCTION,
YOUR CODE WILL BE FASTER ON THE 68040.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

sef@kithrup.COM (Sean Eric Fagan) (02/19/91)

In article <SCOTT.91Feb18100702@erick.gac.edu> scott@erick.gac.edu (Scott Hess) writes:
>If worst comes to worst, the '030 version will be slowed down.

Actually, what you do, if you have shared libraries (which the NeXT does,
right?), is call the library routine sin().  On the '30, you use fsin; on
the '40, you use the normal library method.

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.