[comp.sys.ibm.pc] How is a 68000 as fast as an 80386??

alex@bilver.UUCP (Alex Matulich) (02/28/90)

Can someone help me with a puzzling problem?

In my current C programming project, I have written some functions that
perform statistical things on 400 separate data sets (linear regressions,
standard errors, etc).  This number-crunching part takes about a minute to
complete when I run it on my Amiga.  My Amiga uses a 68000 running at 14 MHz
(twice the normal cpu speed) and no math chip.  The compiler is Lattice C
4.0 in 32-bit addressing mode (similar to the IBM "large" memory model).

Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
that number-crunching section still took about a minute to execute --
actually a few seconds longer than my Amiga.  All source code was the same!

This is plainly ridiculous, I thought.  I was always under the impression
that there is NO WAY a 14 MHz Amiga can match the performance of a 25 MHz
80386 machine.  I thought of a few possible reasons.  I am sure they are
way off base, because I have little familiarity with IBM-style architecture,
but here they are:

1)  Perhaps MS-DOS takes up a lot more overhead than AmigaDOS, but I doubt
    it.  I always considered MS-DOS to be an operating system that gets in
    the way of the task at hand only minimally.  I had no other programs
    resident.  If anything, the Amiga Exec had more overhead, since there
    were two other "active" background tasks and 16 "waiting" background
    tasks for the operating system to worry about.

2)  Possibly the IBM display is CPU-bound, as in the Macintosh, where
    program execution is only performed during vertical screen blanks.  This
    isn't the case, is it?  Isn't the video circuitry independent of the CPU?

3)  Maybe the Turbo C compiler for IBM compatibles is not as efficient as
    the old Lattice compiler I use for the Amiga.  I find it hard to believe.
    Perhaps each compiler's implementation of math functions like sqrt()
    are different enough to account for this incident.  The math library I
    used on each machine was the default.  On the Amiga, this is the slowest
    library.  There are others (IEEE, FFP, etc) which are faster but they
    sacrifice precision.

4)  Might the 68000's math instructions be more streamlined than those on the
    80386?  It takes 70 clock cycles to do a multiply and 158 to do a divide
    on a 68000, plus at most 16 cycles to calculate addresses.  I don't know
    what the specs are for an 80386.

5)  I know the 80386 has special modes of operation, incompatible with
    previous chips, that allow it to run at its full potential.  Is this
    the reason my program isn't running at its rightful speed?  Are these
    special modes accessible when using DOS?  If so, how?

I have absolutely no intention of starting a computer war here.  This is
new to me, and seems bizarre.  I would like an explanation, and if possible
some suggestions on speeding up the execution of my software on the 80386.
IBM compatibles are the target machines for my software anyway (I just like
doing the development on the Amiga).  Please e-mail me any help (or flames?)
and I'll summarize.

-- 
     ///  Alex Matulich
    ///  Unicorn Research Corp, 4621 N Landmark Dr, Orlando, FL 32817
\\\///  alex@bilver.UUCP    ...uunet!tarpit!bilver!alex
 \XX/  From BitNet use: bilver!alex@uunet.uu.net

harlow@plains.UUCP (Jay B. Harlow) (03/01/90)

In article <505@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>Can someone help me with a puzzling problem?
>
>In my current C programming project, I have written some functions that
>perform statistical things on 400 separate data sets (linear regressions,
>standard errors, etc).  This number-crunching part takes about a minute to
>complete when I run it on my Amiga.  My Amiga uses a 68000 running at 14 MHz
>(twice the normal cpu speed) and no math chip.  The compiler is Lattice C
>4.0 in 32-bit addressing mode (similar to the IBM "large" memory model).
>
>Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
>at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
   ^^^^^^^  only creates 16-bit code....
>Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
>that number-crunching section still took about a minute to execute --
	16-bit code on 386 can you say lame duck (read below....)
>actually a few seconds longer than my Amiga.  All source code was the same!
>
>5)  I know the 80386 has special modes of operation, incompatible with
>    previous chips, that allow it to run at its full potential.  Is this
>    the reason my program isn't running at its rightful speed?  Are these
	Yes partly, 1) 32-bit inst in real mode 2) 32-bit Protected mode
>    special modes accessible when using DOS?  If so, how?

Alex,
   the reason your program ran 'slow' on the 386 is because you compiled it
for a 286 (at best) LARGE model, a large model on a x86 means all addresses
have segment offset, haveing the x86 load segments is very 'expensive'
time wise (i normally use a small model & put large data out in far memory...)
(on a 286 or below memory accesses are restricted to 64K segments so any thing
larger then 64K (128K if you count code & data...) needs new segment) 
I don't know of any compilers (in turbo C price range) that will handle 
386 specific instructions.  Yes you can get 386 compilers but all the 
ones i know of need a 386 *nix or DOS Extenders (can you say a grand $$)
there is hope, MASM & TASM (& others...) support 386 instructions.
if you can find the 'busy' work, you could recode the busy stuff in 386
assembler (real mode) which would have a nice speed up (40% is what pkzip
claims) then the 386 'should' blow the pants off of YOUR 14mhz 68000,

    **** I AM NOT REFERING TO 680x0's IN GENERAL JUST HIS CASE!!!!! ****

the other possiblility under dos is DOS Extenders, but like i mentioned 
above the cost $$$$, ( i know my budget says NO ;-) which allow
one to write FULL 32 bit programs (nice, about the same as YOUR 32-bit
addressing mode)  A DOS Extended puts the processor in Protected mode for
your program to run, handles the interface to DOS, returns to DOS when 
your programs 'exits'.

		I hope this helps.....
			Jay
-- 
		Jay B. Harlow	<harlow@plains.nodak.edu>
	uunet!plains!harlow (UUCP)	harlow@plains (Bitnet)

Of course the above is personal opinion, And has no bearing on reality...

mark@acsdev.uucp (Mark Grand) (03/02/90)

In article <505@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:

   In my current C programming project, I have written some functions that
   perform statistical things on 400 separate data sets (linear regressions,
   standard errors, etc).  This number-crunching part takes about a minute to
   complete when I run it on my Amiga.  My Amiga uses a 68000 running at 14 MHz
   (twice the normal cpu speed) and no math chip.  The compiler is Lattice C
   4.0 in 32-bit addressing mode (similar to the IBM "large" memory model).

   Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
   at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
   Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
   that number-crunching section still took about a minute to execute --
   actually a few seconds longer than my Amiga.  All source code was the same!

Sounds like you've discovered why Lattice charges more for their
compiler.  The Lattice compiler does some real optimizations.  Turbo C
does not do so much optimization.  Another factor is the fact that you
were using large model pointers.  32 bit pointers (unless you're in
native 386 mode) have a higher speed penalty associated with them on
a 386 than on a 68000.  If there's any way for your data to be
referenced using near pointers, you will be able to get more speed.
--
========

Mark Grand
Premenos Corporation			415-827-3820 x307
1485 Enea Court
Concord, CA   94520			...!pacbell!acsdev!mark

rdo031@tijc02.UUCP (Rick Odle ) (03/13/90)

From article <3666@plains.UUCP>, by harlow@plains.UUCP (Jay B. Harlow):
> In article <505@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
>>Can someone help me with a puzzling problem?
>>
>>In my current C programming project, I have written some functions that
>>perform statistical things on 400 separate data sets (linear regressions,
>>.....
>>Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
>>at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
>    ^^^^^^^  only creates 16-bit code....
>>Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
>>that number-crunching section still took about a minute to execute --
> 	16-bit code on 386 can you say lame duck (read below....)
>>actually a few seconds longer than my Amiga.  All source code was the same!
>>
>>5)  I know the 80386 has special modes of operation, incompatible with
>>    previous chips, that allow it to run at its full potential.  Is this
>>    the reason my program isn't running at its rightful speed?  Are these
> 	Yes partly, 1) 32-bit inst in real mode 2) 32-bit Protected mode
>>    special modes accessible when using DOS?  If so, how?
> 
> Alex,
>    the reason your program ran 'slow' on the 386 is because you compiled it
> for a 286 (at best) LARGE model, a large model on a x86 means all addresses
> have segment offset, haveing the x86 load segments is very 'expensive'
> time wise (i normally use a small model & put large data out in far memory...)
>........ more on small models

The only fair test here is to do the test with large model.  While it
is true that the 80x86 processors will let you execute code in a small
model architecture,  this is only applicable to fairly small applications
( I know this is relatively speaking, and that 10 years ago 64k of code
was a fairly large application).  The 680x0 family architecture ALWAYS
fetches long word addresses (32 bits), so the most fair comparision
is the x86 large model.

BTW, this very feature (large linear address space) on the 68k family is
what makes it a somewhat more desirable processor to program on (no
segments to wory about).  On the other hand, the segmented architecture
lends itself to being able to develop position independent code easier.

rick
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
rick odle		texas instruments industrial systems division
(615) 461-2371  	johnson city, tn  37601          __   .  _     /__,
uucp:		mcnc!rti!tijc02!rdo031                  (    (  (__  /  (
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

kdq@demott.COM (Kevin D. Quitt) (03/14/90)

In article <908@tijc02.UUCP> rdo031@tijc02.UUCP (Rick Odle           ) writes:

>  The 680x0 family architecture ALWAYS
>fetches long word addresses (32 bits), so the most fair comparision
>is the x86 large model.

    WRONG! The 680x0 family fetches on 16bit word addresses.  The 68020
and later are more efficient on long-word fetches, but can execute from
odd-byte addresses.

> On the other hand, the segmented architecture
>lends itself to being able to develop position independent code easier.

    TRY AGAIN! In 68K assembly language, you have to go out of your way to
generate absolute addresses - some assemblers even flag them as errors!
Every compiler in existence uses PC relative addressing - it's faster
code and smaller.  It's the segmented architecture that makes PIC
(position independent code) difficult.  How do you jump or call out of
your segment without specifying absolute segments, or going through
gymnastics to calculate things in your code? In a segmented
architecture, only trivial (<64K) programs can be easily be made PIC. 

    There are no advantages to the segmented architecture when the
segment registers are available to user level code.  If the segment
registers are available to the operating system only and the segments
are sufficiently large, then they're a pretty good (i.e.  low overhead)
way to handle simple memory management schemes.  Memory management
belongs to the operating system, not to user code.  Unfortunately, the
segments on the x86 are so small that the user is burdened with the
extra effort (except for 386 native mode, where generally the ignored). 

    I really don't mean to start a religeous war - I regularly use most
members of both families.  But let's get the facts right.

-- 

Kevin D. Quitt                          Manager, Software Development
DeMott Electronics Co.                  VOICE (818) 988-4975
14707 Keswick St.                       FAX   (818) 997-1190
Van Nuys, CA  91405-1266                MODEM (818) 997-4496 Telebit PEP last
34 12 N  118 27 W                       srhqla!demott!kdq   kdq@demott.com

  "Next time, Jack, write a God-damned memo!" - Jack Ryan - Hunt for Red Oct.

Ralf.Brown@B.GP.CS.CMU.EDU (03/14/90)

In article <908@tijc02.UUCP>, rdo031@tijc02.UUCP (Rick Odle           ) wrote:
}From article <3666@plains.UUCP>, by harlow@plains.UUCP (Jay B. Harlow):
}> In article <505@bilver.UUCP> alex@bilver.UUCP (Alex Matulich) writes:
}>>Naturally, I wanted more speed, so I ported the program to an AT&T 386WGS
}>>at work, which is a 25 MHz 80386 IBM compatible.  I compiled it using
}>>Turbo C 2.0, large memory model.  Then I watched in chagrined disbelief as
}>>that number-crunching section still took about a minute to execute --
}>>actually a few seconds longer than my Amiga.  All source code was the same!
}>>
}>>5)  I know the 80386 has special modes of operation, incompatible with
}>>    previous chips, that allow it to run at its full potential.  Is this
}>>    the reason my program isn't running at its rightful speed?  Are these
}>       Yes partly, 1) 32-bit inst in real mode 2) 32-bit Protected mode
}>>    special modes accessible when using DOS?  If so, how?
}> 
}> Alex,
}>    the reason your program ran 'slow' on the 386 is because you compiled it
}> for a 286 (at best) LARGE model, a large model on a x86 means all addresses
}> have segment offset, haveing the x86 load segments is very 'expensive'
}
}The only fair test here is to do the test with large model.  While it
}is true that the 80x86 processors will let you execute code in a small
}model architecture,  this is only applicable to fairly small applications
}( I know this is relatively speaking, and that 10 years ago 64k of code
}was a fairly large application).  The 680x0 family architecture ALWAYS
}fetches long word addresses (32 bits), so the most fair comparision
}is the x86 large model.

Ahem, since we're talking about a 386, the only fair comparison is to 386
small model, which is one 4 gigabyte code segment and another 4 gigabyte
segment for data/stack/heap.  Which 680x0 addresses 8 gigs at once?

}BTW, this very feature (large linear address space) on the 68k family is
}what makes it a somewhat more desirable processor to program on (no

Gee, 4 gigabytes per segment looks pretty large and linear to me....
--
UUCP: {ucbvax,harvard}!cs.cmu.edu!ralf -=- 412-268-3053 (school) -=- FAX: ask
ARPA: ralf@cs.cmu.edu  BIT: ralf%cs.cmu.edu@CMUCCVMA  FIDO: Ralf Brown 1:129/46
"How to Prove It" by Dana Angluin              Disclaimer? I claimed something?
16. proof by cosmology:
    The negation of the proposition is unimaginable or meaningless.  Popular
    for proofs of the existence of God.

paula@bcsaic.UUCP (Paul Allen) (03/15/90)

In article <908@tijc02.UUCP> rdo031@tijc02.UUCP (Rick Odle           ) writes:
[quoted stuff about comparing 68k with 386 deleted]
>
>The only fair test here is to do the test with large model.  While it
>is true that the 80x86 processors will let you execute code in a small
>model architecture,  this is only applicable to fairly small applications
>( I know this is relatively speaking, and that 10 years ago 64k of code
>was a fairly large application).  The 680x0 family architecture ALWAYS
>fetches long word addresses (32 bits), so the most fair comparision
>is the x86 large model.

But the discussion was about the 80386 versus 68k.  There's no need to
handicap the 386 with far pointers when a near pointer is 32 bits.  You
can write a 4Gb application on the 386 without resorting to large model.

>BTW, this very feature (large linear address space) on the 68k family is
>what makes it a somewhat more desirable processor to program on (no
>segments to wory about).  On the other hand, the segmented architecture
>lends itself to being able to develop position independent code easier.

The 386 has its problems (like a lack of registers and the need to remain
compatible with the brain-dead processors of the past), but it certainly
does have a large linear address space.  You're right about the segmentation.
Applications no longer need to worry about it, but it's there when the
system programmer needs it.

Paul Allen
-- 
------------------------------------------------------------------------
Paul L. Allen                       | pallen@atc.boeing.com
Boeing Advanced Technology Center   | ...!uw-beaver!bcsaic!pallen

schaut@cat9.cs.wisc.edu (Rick Schaut) (03/16/90)

In article <909@tijc02.UUCP> rdo031@tijc02.UUCP (Rick Odle           ) writes:
| The only fair test here is to do the test with large model.  While it
| is true that the 80x86 processors will let you execute code in a small
| model architecture,  this is only applicable to fairly small applications
| ( I know this is relatively speaking, and that 10 years ago 64k of code
| was a fairly large application).  The 680x0 family architecture ALWAYS
| fetches long word addresses (32 bits), so the most fair comparision
| is the x86 large model.

An 80386 running DOS is nothing more than a fast 8086 (that's what "real"
mode is).  A fast 68000 matches fairly closely to a fast 8086 in most
operations.

If you want to compare the 80386 with the 68030, then the only truely
fair thing to do is run the 80386 in protected mode.  Even _then_ you're
comparing apples and oranges, and we should all know by now that apples
are better in pies while oranges are better for juice.

--
Rick (schaut@garfield.cs.wisc.edu)

"I'm a theory geek; we use Turing machines!"--Gary Lewandowski

ssingh@watserv1.waterloo.edu ($anjay "lock-on" $ingh - Indy Studies) (03/18/90)

In article <4477@daffy.cs.wisc.edu> schaut@cat9.cs.wisc.edu (Rick Schaut) writes:
>
>If you want to compare the 80386 with the 68030, then the only truely
>fair thing to do is run the 80386 in protected mode.  Even _then_ you're
>comparing apples and oranges, and we should all know by now that apples
>are better in pies while oranges are better for juice.
>
Bravo!! At this point, who cares anymore what processor is better? They
were designed with different motivations. To paraphrase what a worker at
Intel said, we put a stake in the ground with the 8086; we can only build
on it. Look to the 960 for a much cleaner instruction set (the original
question was why 386 assembler is SO wierd.) BTW, I'm probably behind
the times, but when the 486 first came out, it was informally 25 Mhz and
15 VAX MIPS. Motorola waited and out comes a 68040 at 25 Mhz and 20 VAX
MIPS. ALR now has a 33 Mhz 486 out now; net throughput increase: 25%;
projected VAX MIPS (my guess, anyway): 20.

I like variety: My next machine will be based on a 680x0.

 
-- 
"No one had the guts... until now..."  
|-$anjay "lock [+] on" $ingh	ssingh@watserv1.waterloo.edu	N.A.R.C. ]I[-|
"No his mind is not for rent, to any God or government."-Rush, Moving Pictures
!being!mind!self!cogsci!AI!think!nerve!parallel!cybernetix!chaos!fractal!info!

bcw@rti.rti.org (Bruce Wright) (03/22/90)

In article <1503@watserv1.waterloo.edu>, ssingh@watserv1.waterloo.edu ($anjay "lock-on" $ingh - Indy Studies) writes:
> BTW, I'm probably behind
> the times, but when the 486 first came out, it was informally 25 Mhz and
> 15 VAX MIPS. Motorola waited and out comes a 68040 at 25 Mhz and 20 VAX
> MIPS. ALR now has a 33 Mhz 486 out now; net throughput increase: 25%;
> projected VAX MIPS (my guess, anyway): 20.

Am I the only one that thinks these numbers look awfully optimistic?
I've used PC's and VAXes for quite some time, and my informal
benchmarks would give about 3 VAX MIPS to a 25 MHz 386;  surely
the 486 isn't 5 times faster than a 386 at the same clock speed?!
The rags I've seen make the 486 about 2-3 times faster than a 386,
maximum (some applications more like 1.5), which would make a 486
AT MOST a 10 VAX MIPS machine, and probably more like a 6 VAX MIPS
machine.

Unfortunately, in this business, hype is always the order of the
day.  Hopefully once they get 486 chips that really work ( ;-) )
then the true state of affairs will be somewhat more visible.

						Bruce C. Wright