[comp.sys.m68k] 386 vs 020 and big benchmarks

tomk@intsc.UUCP (Tom Kohrs @fae) (04/17/87)

In article <513@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes:
> In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
                                            ^^^^^^^^^  (note for ref.)
> :with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
> :25MHz soon.
> 
> Well, here's one that takes 8k, somewhat larger than 256 bytes.  A 25 mHz
> 68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.)
> 
Code left in for reference.

> siev.c:
> #define S 8190
> char f[S+1];
> main()
> {
> /*	register long i,p,k,c,n;	For 32 bit entries for PC */
> 	register int i,p,k,c,n;
> 	for (n = 1; n <= 10; n++) {
> 		c = 0;
> 		for (i = 0; i <= S; i++) f[i] = 1;
___
| 		for (i = 0; i <= S; i++) {
| 			if (f[i]) {
| 				p = i + i + 3; k = i + p;
| 				while (k <= S) { f[k] = 0; k += p; }
| 				c++;
| 			}
| 		}
|__
> 	}
> 	printf("\n%d primes.\n", c);
> }
 
The following is the as output of the rcc compiler (no opt.) under Unix 
for the inner loop of the sieve benchmark as included above:

.L21:
	xorl	%edi,%edi
	jmp	.L26
.L27:
	cmpb	$0,f(%edi)
	je	.L28
	movl	%edi,%eax
	addl	%edi,%eax
	leal	3(%eax),%eax
	movl	%eax,%esi
	movl	%edi,%eax
	addl	%esi,%eax
	movl	%eax,%ebx
	jmp	.L30
.L31:
	movb	$0,f(%ebx)
	movl	%esi,%eax
	addl	%eax,%ebx
.L30:
	cmpl	$8190,%ebx
	jle	.L31
.L29:
	incl	-4(%ebp)
.L28:
	incl	%edi
.L26:
	cmpl	$8190,%edi
	jle	.L27

The compiler generated ~62 bytes of code (if I ever figure out sdb I will
know for sure).  Assuming the 020 compiler does not generate more than 4X
the amount of code this will all fit into the 020 cache.  That is what I
meant when I said that the benchmarks that show the 020 as faster fit into
256 bytes.  

If all you want to do is calculate sieves all day then use the '020.  But
if you want to do real crunching on large problems then the 386 will run
circles around the '020.  That is not to say the '020 with the 256 byte
cache does not have its niches.  There is a number of application in the
embedded control area that have inner loops that fit nicely in 256 bytes.
Line drawing routines in graphics applications is one, thats why we build
H/W accelerators for that.  If performance is what you need on Megabyte
size problems the 386 will give you 50% - 75% more speed at the same clock
rate.

BWT:  The numbers for the 386 on the 18MHz CompDyn (.59sec) matched what
I got under Unix on my MB-I box (.59sec).  The biggest performance hit
on this benchmark for the systems tested is due to the wait states taken
for a write (3ws on the MB-I board).  This could easily be fixed on a 
system with posted writes or a write back cache.

> Compile-Link  Execute		Code
> Real	User	Real	User	Bytes	System
> 
> 7.4	.8	.34	.3416	124	Definicom SYS 68020 25mHz SiVlly 11/86
> 11.8	2.8	.56	.56	131	CompDyn (Intel MB) + 386 Toolkit 12/86
  3.0	.6      .59     .59      ?      Intel 310/386 16MHz Unix V.3 rcc  4/16

keithe@tekgvs.UUCP (04/18/87)

Posted: Thu Apr 16 17:56:01 1987

In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> If all you want to do is calculate sieves all day then use the '020.  But
> if you want to do real crunching on large problems then the 386 will run
> circles around the '020.  That is not to say the '020 with the 256 byte
> cache does not have its niches.  There is a number of application in the
> embedded control area that have inner loops that fit nicely in 256 bytes.

This is exactly the kind of stuff one would expect to read in a posting
with this in the header:
> From: tomk@intsc.UUCP (Tom Kohrs @fae)
> Organization: Intel Sales, Silicon Valley, Ca.
                ^^^^^^^^^^^
And it is exactly the kind of posting that should only be published in
/dev/null.

keith

[this line added to appease the news posting program]
[this line added to appease the news posting program]
[this line added to appease the news posting program]
[this line added to appease the news posting program]

caf@omen.UUCP (04/18/87)

In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
:> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
:                                            ^^^^^^^^^  (note for ref.)
:> :with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
:> :25MHz soon.
OK, here's one that may not fit in 256 bytes.

time bc <<f
2 ^ 4096
f

Make bct executable.  Clear the screen (no scrolling please). Then run it.

Real Time	System/comments (ws = wait state(s))

0:03.6		Amdahl 580 3 users 2/86 ames!aurora!eugene
0:06.6		Gould UTX32 6/84
0:07.6		Vax 8600 running 4.3BSD Beta, 16 Feb 1986
0:09.5	u	Sun 3/260 68020 "25 Mhz" BSD 4.3	Unix-EXPO 10/86
0:14.5	u	HP 9000/840  Spectrum RISC HP-Unix	Unix-EXPO 10/86
0:15		DG/UX 2.01, DG MV/10000SX, 8MB
0:15.7	u	Compaq 386      80386 Xenix 5	Unix-EXPO 10/86
0:15.8	u	Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1)
0:20		Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86  (1)
0:17.6	u	Corvus 386      80386 Xenix 5	Unix-EXPO 10/86

The Gentleman from Intel claims siev has a "hot spot" that fits within the
256 byte 68020 cache.  Close, but no cigar.  Those triple wait states
on writes are for references within the cache?  Don't tell us Unix 5.3 for
the 386 is using self modifying code.  So even within that inner loop,
a significant portion of memory references do not fall within the cache.

:  3.0	.6      .59     .59      ?      Intel 310/386 16MHz Unix V.3 rcc  4/16
                                ^^^

Please do the following:
	cc -O -c siev.c
	size siev.o
and report the results.

Actually, siev is a 15k program on Xenix (text+data).  Presumably much of that
15k actually gets executed, perhaps 10k.

Now Nroff and troff are larger than 15k, but they still might have hot
spots that benefir from the 68020 instruction cache the way siev and dc
appear to.  The 68020's cache won't help much with a 10 MB program which
references every memory location in succession, but then the demand paging
would be thrashing so badly one wouldn't notice instruction speed anyway.

So, how about this test:

yes "Hello World"|sed 10000q |time sort >/dev/null

Let's hear some times for 68020 with and without the cache,
also for 386 boxes.

davet@oakhill.UUCP (04/18/87)

[Sigh.  I guess I'll have to put my marketing hat on :-(  ]

In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

>                 ....                                     That is what I
>meant when I said that the benchmarks that show the 020 as faster fit into
>256 bytes.  

As several others have indicated the instructions executed most in program
loops usually fit into a small instruction cache.  Look at a typical huge
number crunching program.  It consists of loops within loops within loops.
However, at any one time it is usally executing in an innermost loop somewhere
and these tend not to be huge expanses of code.  This has been my experience
with large astrophysics programs.  (If anyone else finds their experience is
different let's hear from you.)

>      ...                        There is a number of application in the
>embedded control area that have inner loops that fit nicely in 256 bytes.
>Line drawing routines in graphics applications is one, thats why we build
>H/W accelerators for that.
>

Huh?  The only way I can interpret this is that since the 386 doesn't have
a small instruction cache designers have to include hardware acclerators to
do line drawing routines and graphics applications.

>If all you want to do is calculate sieves all day then use the '020.  But
>if you want to do real crunching on large problems then the 386 will run
>circles around the '020.                ...
>If performance is what you need on Megabyte
>size problems the 386 will give you 50% - 75% more speed at the same clock
>rate.

People in this newsgroup recognize marketing hype like this when they see it.
All it does in most people's minds is invalidate any data you give out in
support of your cause.  Just present your "facts" and let them speak for
themselves.

What's most surprizing, however, is that in giving us the output of your
compiler you have shown one big reason for doubting your above claims.

Notice the line in the sieve:

	register int i,p,k,c,n;

and then notice your compiler fails to assign the variable 'c' to a register
for the statement 'c++;':

>.L29:
>	incl	-4(%ebp)

This would imply that since the variable 'c' is fourth in the list that your
compiler on the 386 is limited to supporting only three register variables.
The same compiler for the M68000 assigns all five variables to registers which
is only *HALF* of the ten available for variable assignment.

If the 386 only supports three register variables and in this tiny benchmark
(which you love to deride as being insignificant) the 386 actually runs
out of registers to assign, how are we supposed to believe your claims that
on real meaty applications the 386 actually performs better than other
architectures with plenty of registers?

>>  Execute                Code
>> Real    User    System
>> 
>> .34     .3416   Definicom SYS 68020 25mHz SiVlly 11/86
>> .56     .56     CompDyn (Intel MB) + 386 Toolkit 12/86
>  .59     .59     Intel 310/386 16MHz Unix V.3 rcc  4/16
   .46     .46     Motorola VME310 16MHz Unix V.3 pcc2 4/87

 -- Dave Trissel  Motorola Semiconductor Inc., Austin, Texas
	{ihnp4,seismo}!ut-sally!im4u!oakhill!davet

mash@mips.UUCP (04/19/87)

In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes:
>In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
....
>>If all you want to do is calculate sieves all day then use the '020.  But
>>if you want to do real crunching on large problems then the 386 will run
>>circles around the '020.                ...
>>If performance is what you need on Megabyte
>>size problems the 386 will give you 50% - 75% more speed at the same clock
>>rate.

It would be nice to see some more "meaty" problems
I have a few collected, but not many where one can get the faster
386s versus 68020s (all that follow are from MIPS Performance Brief,
April 1987):

Doduc (5300-line FORTRAN program to simulate aspects of nuclear reactors):
Rel Perf	Machine
17		Sun3/110, 16.7 MHz
19		80386, 16MHz
22		Sun3/260, 25MHz 68020 + 20Mhz 68881
43		Sun3/260, Weitek FPA
Thus, looks like the 386 might have a slight edge at same clock rate;
hard to see 50-75%., at least on this particular benchmark, and, as usual,
memory systems are often hard to compare 1-1.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

mash@mips.UUCP (John Mashey) (04/19/87)

In article <523@omen.UUCP> caf@.UUCP (PUT YOUR NAME HERE) writes:
>In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
>:> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
>:                                            ^^^^^^^^^  (note for ref.)
>:> :with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
>:> :25MHz soon.
>OK, here's one that may not fit in 256 bytes.
>
>time bc <<f
>2 ^ 4096
>f
>

Sigh.  This illustrates how extremely careful one must be, and how nonintuitive
computer performance analysis can be.  The rest of this is NOT in support of
a 256-byte cache being "enough" (for lots of real programs, it sure would
help to have bigger ones), but simply to show 2 things:
a) One must be careful.
b) The classic "bc" tests are VERY unrepresentative of most real programs.

I ran this program, using our program that turns an executable into a
profiled executable [of dc, of course, most of the time is spent in dc].
99% of the CPU time is spent in one function [mult]
98% of the CPU is spent in 6 source lines [1080-1085 in 4.3BSD dc.c].
On MIPS R2000 [which uses 32-bit instructions, and is often less dense than
more CISCy machines,] this code occupies......
	260 bytes.
Even stranger, about 35-40% of the total cycles are in multiply, divide,
and remainder, a behavior pattern found in no other benchmark that
we've looked at [and we have detailed statistics on dozens of large,
real programs].  On many programs, you see <1% for these together,
on some you might get up to 10%, but that's about it.

One more time: computer architecture arguments CANNOT be settled
by intuition and examination of isolated examples.  If you aren't
following comp.arch, turn it on and retrieve the last few weeks'
flurry of discussions on strings and word-vs-byte addressing.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

geoff@desint.UUCP (Geoff Kuenning) (04/20/87)

In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes:

> If the 386 only supports three register variables and in this tiny benchmark

As usual, Dave is right on target.  Pcc for the 386 does indeed only allow
three register variables.  Furthermore, I am currently involved in writing
an extremely carefully hand-coded assembly loop (bitblt, if you care) for
the 386.  No matter what you do, you can't get more than seven registers
for your use on the 386.  And even disregarding that, the code on the 386
is *much* uglier than the equivalent 68k (not '020) code.  For example,
the 386 only has one register (esi) that can do autoincrement loads, and
one that can do autoincrement stores.  If you want to mix autoincrements
with autodecrements (fortunately something I don't have to do), you have
to execute a special instruction (std) to switch directions.  However, in
practice, it turns out that the standard 386 instructions are often a faster
way to accomplish things than the "special" ones.  My favorite is the
special "loop" instruction.  It decrements a register and then does a
conditional branch.  This takes 11 clocks, plus pipeline-reloading time.
But if you issue a subtract-immediate instruction followed by a jump,
you will only spend 9 clocks plus pipeline-reloading.

On the other hand, the built-in memory management on the 386 is clearly a
really big plus.  Frankly, Motorola, you dropped the ball badly on that
one, going clear back to the 68k itself.  I would *much* rather have
the MMU built into the '020 than have the extra instructions you guys added
(not that I don't like them too, but I think the MMU is much more important).
And yes I realize that there are many applications that don't need an MMU.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

tim@ism780c.UUCP (Tim Smith) (04/21/87)

<OK, here's one that may not fit in 256 bytes.
<
<time bc <<f
<2 ^ 4096
<f
<
<Make bct executable.  Clear the screen (no scrolling please). Then run it.
<
<Real Time	System/comments (ws = wait state(s))
<
<0:03.6		Amdahl 580 3 users 2/86 ames!aurora!eugene
<0:06.6		Gould UTX32 6/84
<0:07.6		Vax 8600 running 4.3BSD Beta, 16 Feb 1986
<0:09.5	u	Sun 3/260 68020 "25 Mhz" BSD 4.3	Unix-EXPO 10/86
<0:14.5	u	HP 9000/840  Spectrum RISC HP-Unix	Unix-EXPO 10/86
<0:15		DG/UX 2.01, DG MV/10000SX, 8MB
<0:15.7	u	Compaq 386      80386 Xenix 5	Unix-EXPO 10/86
<0:15.8	u	Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1)
<0:20		Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86  (1)
<0:17.6	u	Corvus 386      80386 Xenix 5	Unix-EXPO 10/86

I just tried this.  Under V.3 on a 16Mhz 386 ( Intel production board,
with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
system.  bc was compiled with rcc.  There were other users on the
system, but they weren't doing anything significant.

It appears to me that the 386 and the 68020 are about the same.  To
an end user, does it really matter which is in the box? 

Just for fun, I tried this on an AT&T 6300+ running V.3 in 1 meg of
ram.  The results were 46.1 real, 40.3 user, 3.6 sys.  If anyone
wants to give me a cray, I will try it there too. :-)
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,ar C

geoff@desint.UUCP (Geoff Kuenning) (04/22/87)

In article <318@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes:

> Even stranger, about 35-40% of the total cycles are in multiply, divide,
> and remainder, a behavior pattern found in no other benchmark that

Aha!  That explains my results last night, when I (just for grins) compared
a 16-MHz 386 with a creaky old 10 MHz 68010.  The 386 beat the '010 out
by a factor of almost 10.  Why so much?  Because my '010 compiler uses
subroutines for multiply/divide/remainder on ints.  I feel so much
better. :-)
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

ihm@nrcvax.UUCP (04/22/87)

>Posted: Thu Apr 16 17:56:01 1987
>
>In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
>> 
>> If all you want to do is calculate sieves all day then use the '020.  But
>> if you want to do real crunching on large problems then the 386 will run
>> circles around the '020.  That is not to say the '020 with the 256 byte
>> cache does not have its niches.  There is a number of application in the
>> embedded control area that have inner loops that fit nicely in 256 bytes.

I suppose this guy thinks Motorola designed the cache specifically for
benchmarks?  Caches have long been used to improve performance in real
large-scall applications.  Unless you program with NO loops, an
instruction cache will provide a very real benefit to real
applications.

>
>This is exactly the kind of stuff one would expect to read in a posting
>with this in the header:
>> From: tomk@intsc.UUCP (Tom Kohrs @fae)
>> Organization: Intel Sleeze, Silicon Valley, Ca.
>                ^^^^^^^^^^^^
>And it is exactly the kind of posting that should only be published in
>/dev/null.
>
>keith
>

Yep, and do you know who claimed that the 286 was of comparable speed
to the 68020?  That's right.  I have in my office two 286 based IBM
toys and in the computer room there are two 68020 (only 16.67 MHz)
unix machines.  Do you want to know which machines I (and about 20
other people) use? That's right, the 68020's.  They are on the order
of 20 times as fast in our applications such as database maintenance,
editing, and more like 60 times as fast at compilations.  Regardless
of whether the cache is a contributing factor or not, the total
throughput is vastly superior to the 286, and from what I am hearing
and reading, the 386's real throughput in real applications is only on
the order of 3 to 6 times a 286.  The 386 maybe able to run circles
around the 68020, but can it outcompute it?  The 68020 doesn't do much
running it just sorta sits there in its socket computing... (:->)
					<>IHM<>

caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/22/87)

In article <6011@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes:
:<0:15.7	u	Compaq 386      80386 Xenix 5	Unix-EXPO 10/86
:<0:15.8	u	Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1)
:<0:20		Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86  (1)
:<0:17.6	u	Corvus 386      80386 Xenix 5	Unix-EXPO 10/86
:
:I just tried this.  Under V.3 on a 16Mhz 386 ( Intel production board,
:with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
:system.  bc was compiled with rcc.  There were other users on the
:system, but they weren't doing anything significant.
:
Assuming the "Intel Production Board" you refer to is the "386 AT Clone"
motherboard, your 11.2 second user time represents a 100 per cent
improvement in code execution speed over Xeix code running on the 386
toolkit (factoring the 18 mHz vs 16 mHz speed).

I trust this "rcc" is real software that I'll be able to see running soon.

Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf  Omen Technology Inc "The High Reliability Software"
  17505-V Northwest Sauvie Island Road Portland OR 97231  Voice: 503-621-3406
TeleGodzilla BBS: 621-3746 2400/1200  CIS:70007,2304  Genie:CAF  Source:TCE022
  omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
  omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly

ihm@nrcvax.UUCP (04/23/87)

>In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes:
>
>> If the 386 only supports three register variables and in this tiny benchmark
>
>As usual, Dave is right on target.  Pcc for the 386 does indeed only allow
>three register variables.  Furthermore, I am currently involved in writing
>an extremely carefully hand-coded assembly loop (bitblt, if you care) for
>the 386.  No matter what you do, you can't get more than seven registers
>for your use on the 386.  And even disregarding that, the code on the 386
>is *much* uglier than the equivalent 68k (not '020) code.  For example,
>the 386 only has one register (esi) that can do autoincrement loads, and
>one that can do autoincrement stores. 

[Lotsa really UGLY intel architectural restrictions omitted]

> [...]  I would *much* rather have
>the MMU built into the '020 than have the extra instructions you guys added
>(not that I don't like them too, but I think the MMU is much more important).
>And yes I realize that there are many applications that don't need an MMU.

Yeah, but try coding up your tight bitblit using bitfield instructions
on the '020.  I think you will find that it is MUCH shorter and faster
than the 68000 version.  I seem to recall a coworker doing exactly
this, for a general purpose bitblt, using the '86, the 68k and the
68020, (ignoring for the sake of comparison the ridiculous address
space limitations on the 80/1/286 (386 didn't exist yet), used small
model).  The hand optimized code on the '86 was something like 12
times as many instructions as for the 68K which was only about 2.5 to
3 times as large as for the '020.  I don't recall the 68000 and 68010
perfomance comparison, but the 68020 did this about 120 times as fast
as a 186.  The 286 was only about 3 times a 186, and the preliminary
numbers Intel had supplied us for the 386 were only about 3 or 4 times
the 286.

I would have prefered if the MMU was just made available sooner; not
necessarily on the same chip.  It will be interesting to see what
happens with the 68030 MMU now that it IS on chip.

Cheerz--
						--i

steve@edm.UUCP (Stephen Samuel) (04/23/87)

In article <523@omen.UUCP>, caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
> In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> > :Show me a benchmark that does not fit in 256 bytes thats even keeps up
>                                             ^^^^^^^^^  (note for ref.)
  . . . .
> The Gentleman from Intel claims siev has a "hot spot" that fits within the
> 256 byte 68020 cache.  ....

That's exactly the point!!! Why do you think Motorola put the d'#n thing IN?
LOTS of program have 'hot spots' (remember the 90/10 rule?????!) and an
instruction cache is designed to take advantage of this fact. If you have
to look so hard for programs that DON'T have such hot-spots, then obviously
Motorola was on the right track when they put in a cache. 
Don't send US on a wild goose chase to fine a benchmark that gives the '386
a snowball's chance -- It's YOUR crusade -- YOU find the goose.

 "We get enough 'snow' related tasks up here in Canada, as it is. I intend to
 enjoy the summer now that it's here."
-- 
-------------
 Stephen Samuel 
  {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve

tim@ism780c.UUCP (Tim Smith) (04/23/87)

In article <526@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes:
:: I just tried this.  Under V.3 on a 16Mhz 386 ( Intel production board,
:: with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
:: system.  bc was compiled with rcc. 
::
: Assuming the "Intel Production Board" you refer to is the "386 AT Clone"
: motherboard, your 11.2 second user time represents a 100 per cent
: improvement in code execution speed over Xenix code running on the 386
: toolkit (factoring the 18 mHz vs 16 mHz speed).

Nope, it is not a 386 AT clone.  It is a Multibus board.  I wonder if the
bus on it is twice as wide as AT clone bus?  That would explain the factor
of two.

: I trust this "rcc" is real software that I'll be able to see running soon.

It appears to be real.  It is the compiler that whoever sells V3 for
the 386 will probably be selling.  As for who will be selling V3, that
is a mystery to me!  I would guess AT&T or Intel, but I have no idea.
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,3706

caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/26/87)

In article <6048@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes:
:In article <526@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes:
::: I just tried this.  Under V.3 on a 16Mhz 386 ( Intel production board,
::: with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
::: system.  bc was compiled with rcc. 
:::
:: Assuming the "Intel Production Board" you refer to is the "386 AT Clone"
:: motherboard, your 11.2 second user time represents a 100 per cent
:: improvement in code execution speed over Xenix code running on the 386
:: toolkit (factoring the 18 mHz vs 16 mHz speed).
:
:Nope, it is not a 386 AT clone.  It is a Multibus board.  I wonder if the
:bus on it is twice as wide as AT clone bus?  That would explain the factor
:of two.

Nope, the Intel 386 AT cloneboard uses 32 bit data paths to the 2.5 MB of ram
(512k on the board, 2 MB on an Intel dual banked 32 bit plug in).