[comp.sys.ibm.pc] How is a 68000 faster than an 80386?

alex@bilver.UUCP (Alex Matulich) (03/19/90)

The following is a hopefully somewhat concise summary of the voluminous
replies I got concerning an observed slow execution speed of a C program
on a 25 MHz 80836 machine compared to a 14 MHz 68000.  My original
posting is quoted with the ">" symbol.  Many replies contained much
speculation, which I have tried to eliminate.

>Can someone help me with a puzzling problem?

I got a LOT of help.  Thanks, everyone!

>In my current C programming project, I have written some functions that
>perform statistical things on 400 separate data sets (linear regressions,
>standard errors, etc).  This number-crunching part takes about a minute to
>complete when I run it on my Amiga.  My Amiga uses a 68000 running at 14 MHz
>(twice the normal cpu speed) and no math chip.  The compiler is Lattice C
>4.0 in 32-bit addressing mode (similar to the IBM "large" memory model).

>Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
>at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
>Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
>that number-crunching section still took about a minute to execute --
>actually a few seconds longer than my Amiga.  All source code was the same!

Many respondents weren't surprised at my results.
Nearly everyone suggested that the use of the large memory model is a factor
involved in slowing things down, and asked, Why not use some other memory
model?  The large model wastes so much time loading and re-loading segment
registers!  Well, I don't have much choice.  The executable is about 150K 
and the data space it needs is over 200K.  I want to avoid mixed-model
programming, so large model it has to be.

Some were of the opinion that Turbo C's code generation wasn't too hot.
I got the same results with Microsoft C also.

Many people pointed out the 68000 has always had 32-bit internal registers
(and many of them) which isn't the case for the Intel family.  And, that
both Turbo C and Microsoft C default to generating only 16-bit code
compatible with 8086's, and that there is a compiler switch to cause code to
be generated for an 80286.  Sure enough, when I recompiled my program using
this switch, the speed increased about 50%.  It seems I am going to have to
provide different versions of software to my customers, depending on what
cpu they have.  I wish it was possible for a program to "auto-configure"
the way it handles memory and registers and such, so that it runs optimally
on all 80x86 cpus.

There was some wondering as to whether the 25 MHz 386WGS is slow.  Well, it's
still faster than the 20 MHz IBM PS/2 I have sitting next to it, which also
uses a 386.  Neither one has a math coprocessor.  A couple people asked me
if a 68881 math coprocessor works with a 68000.  The answer is: well, yes,
in a way.  The 68010 and up recognize a 68881 as part of its architecture.
The 68000, however can use a 68881 only as a peripheral device.  I have one
in my Amiga, but it was disabled for this speed test for fairness' sake. :-)

>This is plainly ridiculous, I thought.  I was always under the impression
>that there is NO WAY a 14 MHz Amiga can match the performance of a 25 MHz
>80386 machine.  I thought of a few possible reasons.  I am sure they are
>way off base, because I have little familiarity with IBM-style architecture,
>but here they are:

>1) Perhaps MS-DOS takes up a lot more overhead than AmigaDOS, but I doubt
>   it.  I always considered MS-DOS to be an operating system that gets in
>   the way of the task at hand only minimally.  I had no other programs
>   resident.  If anything, the Amiga Exec had more overhead, since there
>   were two other "active" background tasks and 16 "waiting" background
>   tasks for the operating system to worry about.

A few people said that a large number of TSR's will tend to slow things
down (there were none during this speed observation).  Other than that,
MS-DOS will get in the way when accessing large random-access files.
Otherwise, it gets out of the way of the application until the application
terminates, except for a system clock interrupt that happens 18 times a
second, and dynamic RAM refreshing which consumes about 5% of the cpu.

>2) Possibly the IBM display is CPU-bound, as in the Macintosh, where
>   program execution is only performed during vertical screen blanks.
>   This isn't the case, is it?  Isn't the video circuitry independent
>   of the CPU?

Completely independent unless you have a strange video card.  Sorry, this
question misled a few people into thinking that I am doing display output
during calculations -- I'm not.  Many people did say that screen output is
slow on PC's with roughly 40 wait states required for each video memory
access.

>3) Maybe the Turbo C compiler for IBM compatibles is not as efficient as
>   the old Lattice compiler I use for the Amiga.  I find it hard to believe.
>   Perhaps each compiler's implementation of math functions like sqrt()
>   are different enough to account for this incident.  The math library I
>   used on each machine was the default.  On the Amiga, this is the slowest
>   library.  There are others (IEEE, FFP, etc) which are faster but they
>   sacrifice precision.

Some people said that my 68000 Lattice compiler uses a faster and less
precise math library than Turbo C, which uses the full IEEE math spec.  I
checked, and the Lattice library I link with is actually the IEEE library,
so that's not the problem.  The Amiga Lattice library is, however, SLIGHTLY
less precise.  I ran the Savage benchmark on both machines, and my Amiga
reported an accumulated error of 3.18e-7 while the 80386 reported 1.2e-9.
I can't believe two decimal places of extra precision would slow down an
80386 that much.

>4) Might the 68000's math instructions be more streamlined than those on the
>   80386?  It takes 70 clock cycles to do a multiply and 158 to do a divide
>   on a 68000, plus at most 16 cycles to calculate addresses.  I don't know
>   what the specs are for an 80386.

An 80386 needs only 38 cycles for a multiply, and 41 for a divide.  Silicon
technology sure progresses fast!  However if the 80386 is operating in
8086 mode, it's likely that it will use up to 7 times as many instructions
doing the same things as a 68000.  One person was of the opinion that,
although the multiply and divide instructions are more efficient on the 386,
these are integer operations, and FLOATING-point operations are better on
a 68000.

>5) I know the 80386 has special modes of operation, incompatible with
>   previous chips, that allow it to run at its full potential.  Is this
>   the reason my program isn't running at its rightful speed?  Are these
>   special modes accessible when using DOS?  If so, how?

Neither Turbo C nor Microsoft C produces 32 bit code.  Lattice apparently
doesn't either, but a few people had good things to say about its code
generation, and since I also use it on my Amiga, I ordered it for my IBM
programming too ($250, or $125 with educational discount).  I was VERY
impressed with what I got:  4 big manuals, the most extensive function
libraries I've ever seen (comm stuff, graphics, and curses!), an editor
that can be configured any way at all, and 100% ANSI compliance.  I haven't
tested it's performance relative to Turbo or Microsoft C yet, but Lattice
claims a minimum of a 10% improvement in execution over their competitors.

There are apparently a few compilers out on the market that DO produce 32
bit code.  They may require 386 unix or DOS extenders to run (read: very
expensive).  DOS by itself can't use those special new features found on
an 80386.
Metaware and Watcom sell 32-bit compilers; however using them will result
in executable code that will not run on anything less than a 386.

>I have absolutely no intention of starting a computer war here.  This is
>new to me, and seems bizarre.  I would like an explanation, and if possible
>some suggestions on speeding up the execution of my software on the 80386.
>IBM compatibles are the target machines for my software anyway (I just like
>doing the development on the Amiga).  Please e-mail me any help (or flames?)
>and I'll summarize.

I was very pleased that nobody else was interested in a computer war either.
Every reply I got was quite informative.  I think Motorola users and Intel
users can learn a lot from each other if the competitive attitude is dropped.
Too many times I have seen postings like "Ha!  I just ran a benchmark where
a 68010 blew away a 386!!", which, naturally, provokes a lot of negative
responses.  If these people would ask for explanatory information instead of
trying to satisfy their ego or competetive spirit we wouldn't have seen so
many religious computer wars in the past.

Thanks to all the people that responded (in the order I received them)
I apologize if anyone is left out:
uunet!jarthur!rspangle (Randy Spangler)
uunet!sequent!norsk (Doug Thompson)
uunet!ra.cs.Virginia.EDU!gnf3e (Greg Fife)
uunet!beaver.cs.washington.edu!sumax!ole!ray (Ray Berry)
uunet!watmath!looking.on.ca!brad (Brad Templeton)
uunet!cod.nosc.mil!bmarsh (William C. Marsh)
harlow@plains.UUCP (Jay B. Harlow)
uunet!copper.wr.tek.com!michaelk (Michael D. Kersenbrock)
uunet!ames!elroy!nrc.com!ihm (Ian H. Merritt)
uunet!caen.engin.umich.edu!zarnuk (Paul Steven Mccarthy)
uunet!swbatl!texbell!adaptex!neese  (Roy Neese)
uunet!n3dmc!johnl (John Limpert)
uunet!cs.cornell.edu!batcomputer!braner (Moshe Braner)
uunet!dell.dell.com!jdc (Jeremy Chatfield)
uunet!wetblu!cmcl2!lanl!dk (David Knapp)
uunet!tekig5.pen.tek.com!wayneck (Wayne C Knapp)
uunet!mirror.tmc.com!rob (Rob Limbert)
uunet!leadsv!zech (Bill Zech)
uunet!cbema.att.com!las (Larry A Shurr)
uunet!b.gp.cs.cmu.edu!Ralf.Brown
uunet!Morgan.COM!amull (Andrew P. Mullhaupt)
mark@acsdev.uucp (Mark Grand)
uunet!bnr-vpa!bnr-rsc!mlord (Mark Lord)
vmrad@pollux (Bernard Littau)
rdo031@tijc02.UUCP (Rick Odle)
kdq@demott.COM (Kevin D. Quitt)
paula@bcsaic.UUCP (Paul Allen)
schaut@cat9.cs.wisc.edu (Rick Schaut)
-- 
     ///  Alex Matulich
    ///  Unicorn Research Corp, 4621 N Landmark Dr, Orlando, FL 32817
\\\///  alex@bilver.UUCP    ...uunet!tarpit!bilver!alex
 \XX/  From BitNet use: bilver!alex@uunet.uu.net