tomk@intsc.UUCP (Tom Kohrs @fae) (04/17/87)
In article <513@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes: > In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > > :Show me a benchmark that does not fit in 256 bytes thats even keeps up ^^^^^^^^^ (note for ref.) > :with at 16MHz 386. 386's are now shipping at 20MHz for the speed freaks. > :25MHz soon. > > Well, here's one that takes 8k, somewhat larger than 256 bytes. A 25 mHz > 68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.) > Code left in for reference. > siev.c: > #define S 8190 > char f[S+1]; > main() > { > /* register long i,p,k,c,n; For 32 bit entries for PC */ > register int i,p,k,c,n; > for (n = 1; n <= 10; n++) { > c = 0; > for (i = 0; i <= S; i++) f[i] = 1; ___ | for (i = 0; i <= S; i++) { | if (f[i]) { | p = i + i + 3; k = i + p; | while (k <= S) { f[k] = 0; k += p; } | c++; | } | } |__ > } > printf("\n%d primes.\n", c); > } The following is the as output of the rcc compiler (no opt.) under Unix for the inner loop of the sieve benchmark as included above: .L21: xorl %edi,%edi jmp .L26 .L27: cmpb $0,f(%edi) je .L28 movl %edi,%eax addl %edi,%eax leal 3(%eax),%eax movl %eax,%esi movl %edi,%eax addl %esi,%eax movl %eax,%ebx jmp .L30 .L31: movb $0,f(%ebx) movl %esi,%eax addl %eax,%ebx .L30: cmpl $8190,%ebx jle .L31 .L29: incl -4(%ebp) .L28: incl %edi .L26: cmpl $8190,%edi jle .L27 The compiler generated ~62 bytes of code (if I ever figure out sdb I will know for sure). Assuming the 020 compiler does not generate more than 4X the amount of code this will all fit into the 020 cache. That is what I meant when I said that the benchmarks that show the 020 as faster fit into 256 bytes. If all you want to do is calculate sieves all day then use the '020. But if you want to do real crunching on large problems then the 386 will run circles around the '020. That is not to say the '020 with the 256 byte cache does not have its niches. There is a number of application in the embedded control area that have inner loops that fit nicely in 256 bytes. Line drawing routines in graphics applications is one, thats why we build H/W accelerators for that. If performance is what you need on Megabyte size problems the 386 will give you 50% - 75% more speed at the same clock rate. BWT: The numbers for the 386 on the 18MHz CompDyn (.59sec) matched what I got under Unix on my MB-I box (.59sec). The biggest performance hit on this benchmark for the systems tested is due to the wait states taken for a write (3ws on the MB-I board). This could easily be fixed on a system with posted writes or a write back cache. > Compile-Link Execute Code > Real User Real User Bytes System > > 7.4 .8 .34 .3416 124 Definicom SYS 68020 25mHz SiVlly 11/86 > 11.8 2.8 .56 .56 131 CompDyn (Intel MB) + 386 Toolkit 12/86 3.0 .6 .59 .59 ? Intel 310/386 16MHz Unix V.3 rcc 4/16
keithe@tekgvs.UUCP (04/18/87)
Posted: Thu Apr 16 17:56:01 1987 In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > > If all you want to do is calculate sieves all day then use the '020. But > if you want to do real crunching on large problems then the 386 will run > circles around the '020. That is not to say the '020 with the 256 byte > cache does not have its niches. There is a number of application in the > embedded control area that have inner loops that fit nicely in 256 bytes. This is exactly the kind of stuff one would expect to read in a posting with this in the header: > From: tomk@intsc.UUCP (Tom Kohrs @fae) > Organization: Intel Sales, Silicon Valley, Ca. ^^^^^^^^^^^ And it is exactly the kind of posting that should only be published in /dev/null. keith [this line added to appease the news posting program] [this line added to appease the news posting program] [this line added to appease the news posting program] [this line added to appease the news posting program]
caf@omen.UUCP (04/18/87)
In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
:> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
: ^^^^^^^^^ (note for ref.)
:> :with at 16MHz 386. 386's are now shipping at 20MHz for the speed freaks.
:> :25MHz soon.
OK, here's one that may not fit in 256 bytes.
time bc <<f
2 ^ 4096
f
Make bct executable. Clear the screen (no scrolling please). Then run it.
Real Time System/comments (ws = wait state(s))
0:03.6 Amdahl 580 3 users 2/86 ames!aurora!eugene
0:06.6 Gould UTX32 6/84
0:07.6 Vax 8600 running 4.3BSD Beta, 16 Feb 1986
0:09.5 u Sun 3/260 68020 "25 Mhz" BSD 4.3 Unix-EXPO 10/86
0:14.5 u HP 9000/840 Spectrum RISC HP-Unix Unix-EXPO 10/86
0:15 DG/UX 2.01, DG MV/10000SX, 8MB
0:15.7 u Compaq 386 80386 Xenix 5 Unix-EXPO 10/86
0:15.8 u Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1)
0:20 Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86 (1)
0:17.6 u Corvus 386 80386 Xenix 5 Unix-EXPO 10/86
The Gentleman from Intel claims siev has a "hot spot" that fits within the
256 byte 68020 cache. Close, but no cigar. Those triple wait states
on writes are for references within the cache? Don't tell us Unix 5.3 for
the 386 is using self modifying code. So even within that inner loop,
a significant portion of memory references do not fall within the cache.
: 3.0 .6 .59 .59 ? Intel 310/386 16MHz Unix V.3 rcc 4/16
^^^
Please do the following:
cc -O -c siev.c
size siev.o
and report the results.
Actually, siev is a 15k program on Xenix (text+data). Presumably much of that
15k actually gets executed, perhaps 10k.
Now Nroff and troff are larger than 15k, but they still might have hot
spots that benefir from the 68020 instruction cache the way siev and dc
appear to. The 68020's cache won't help much with a 10 MB program which
references every memory location in succession, but then the demand paging
would be thrashing so badly one wouldn't notice instruction speed anyway.
So, how about this test:
yes "Hello World"|sed 10000q |time sort >/dev/null
Let's hear some times for 68020 with and without the cache,
also for 386 boxes.
davet@oakhill.UUCP (04/18/87)
[Sigh. I guess I'll have to put my marketing hat on :-( ] In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > .... That is what I >meant when I said that the benchmarks that show the 020 as faster fit into >256 bytes. As several others have indicated the instructions executed most in program loops usually fit into a small instruction cache. Look at a typical huge number crunching program. It consists of loops within loops within loops. However, at any one time it is usally executing in an innermost loop somewhere and these tend not to be huge expanses of code. This has been my experience with large astrophysics programs. (If anyone else finds their experience is different let's hear from you.) > ... There is a number of application in the >embedded control area that have inner loops that fit nicely in 256 bytes. >Line drawing routines in graphics applications is one, thats why we build >H/W accelerators for that. > Huh? The only way I can interpret this is that since the 386 doesn't have a small instruction cache designers have to include hardware acclerators to do line drawing routines and graphics applications. >If all you want to do is calculate sieves all day then use the '020. But >if you want to do real crunching on large problems then the 386 will run >circles around the '020. ... >If performance is what you need on Megabyte >size problems the 386 will give you 50% - 75% more speed at the same clock >rate. People in this newsgroup recognize marketing hype like this when they see it. All it does in most people's minds is invalidate any data you give out in support of your cause. Just present your "facts" and let them speak for themselves. What's most surprizing, however, is that in giving us the output of your compiler you have shown one big reason for doubting your above claims. Notice the line in the sieve: register int i,p,k,c,n; and then notice your compiler fails to assign the variable 'c' to a register for the statement 'c++;': >.L29: > incl -4(%ebp) This would imply that since the variable 'c' is fourth in the list that your compiler on the 386 is limited to supporting only three register variables. The same compiler for the M68000 assigns all five variables to registers which is only *HALF* of the ten available for variable assignment. If the 386 only supports three register variables and in this tiny benchmark (which you love to deride as being insignificant) the 386 actually runs out of registers to assign, how are we supposed to believe your claims that on real meaty applications the 386 actually performs better than other architectures with plenty of registers? >> Execute Code >> Real User System >> >> .34 .3416 Definicom SYS 68020 25mHz SiVlly 11/86 >> .56 .56 CompDyn (Intel MB) + 386 Toolkit 12/86 > .59 .59 Intel 310/386 16MHz Unix V.3 rcc 4/16 .46 .46 Motorola VME310 16MHz Unix V.3 pcc2 4/87 -- Dave Trissel Motorola Semiconductor Inc., Austin, Texas {ihnp4,seismo}!ut-sally!im4u!oakhill!davet
mash@mips.UUCP (04/19/87)
In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes: >In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: .... >>If all you want to do is calculate sieves all day then use the '020. But >>if you want to do real crunching on large problems then the 386 will run >>circles around the '020. ... >>If performance is what you need on Megabyte >>size problems the 386 will give you 50% - 75% more speed at the same clock >>rate. It would be nice to see some more "meaty" problems I have a few collected, but not many where one can get the faster 386s versus 68020s (all that follow are from MIPS Performance Brief, April 1987): Doduc (5300-line FORTRAN program to simulate aspects of nuclear reactors): Rel Perf Machine 17 Sun3/110, 16.7 MHz 19 80386, 16MHz 22 Sun3/260, 25MHz 68020 + 20Mhz 68881 43 Sun3/260, Weitek FPA Thus, looks like the 386 might have a slight edge at same clock rate; hard to see 50-75%., at least on this particular benchmark, and, as usual, memory systems are often hard to compare 1-1. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
mash@mips.UUCP (John Mashey) (04/19/87)
In article <523@omen.UUCP> caf@.UUCP (PUT YOUR NAME HERE) writes: >In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: >:> :Show me a benchmark that does not fit in 256 bytes thats even keeps up >: ^^^^^^^^^ (note for ref.) >:> :with at 16MHz 386. 386's are now shipping at 20MHz for the speed freaks. >:> :25MHz soon. >OK, here's one that may not fit in 256 bytes. > >time bc <<f >2 ^ 4096 >f > Sigh. This illustrates how extremely careful one must be, and how nonintuitive computer performance analysis can be. The rest of this is NOT in support of a 256-byte cache being "enough" (for lots of real programs, it sure would help to have bigger ones), but simply to show 2 things: a) One must be careful. b) The classic "bc" tests are VERY unrepresentative of most real programs. I ran this program, using our program that turns an executable into a profiled executable [of dc, of course, most of the time is spent in dc]. 99% of the CPU time is spent in one function [mult] 98% of the CPU is spent in 6 source lines [1080-1085 in 4.3BSD dc.c]. On MIPS R2000 [which uses 32-bit instructions, and is often less dense than more CISCy machines,] this code occupies...... 260 bytes. Even stranger, about 35-40% of the total cycles are in multiply, divide, and remainder, a behavior pattern found in no other benchmark that we've looked at [and we have detailed statistics on dozens of large, real programs]. On many programs, you see <1% for these together, on some you might get up to 10%, but that's about it. One more time: computer architecture arguments CANNOT be settled by intuition and examination of isolated examples. If you aren't following comp.arch, turn it on and retrieve the last few weeks' flurry of discussions on strings and word-vs-byte addressing. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD: 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
geoff@desint.UUCP (Geoff Kuenning) (04/20/87)
In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes: > If the 386 only supports three register variables and in this tiny benchmark As usual, Dave is right on target. Pcc for the 386 does indeed only allow three register variables. Furthermore, I am currently involved in writing an extremely carefully hand-coded assembly loop (bitblt, if you care) for the 386. No matter what you do, you can't get more than seven registers for your use on the 386. And even disregarding that, the code on the 386 is *much* uglier than the equivalent 68k (not '020) code. For example, the 386 only has one register (esi) that can do autoincrement loads, and one that can do autoincrement stores. If you want to mix autoincrements with autodecrements (fortunately something I don't have to do), you have to execute a special instruction (std) to switch directions. However, in practice, it turns out that the standard 386 instructions are often a faster way to accomplish things than the "special" ones. My favorite is the special "loop" instruction. It decrements a register and then does a conditional branch. This takes 11 clocks, plus pipeline-reloading time. But if you issue a subtract-immediate instruction followed by a jump, you will only spend 9 clocks plus pipeline-reloading. On the other hand, the built-in memory management on the 386 is clearly a really big plus. Frankly, Motorola, you dropped the ball badly on that one, going clear back to the 68k itself. I would *much* rather have the MMU built into the '020 than have the extra instructions you guys added (not that I don't like them too, but I think the MMU is much more important). And yes I realize that there are many applications that don't need an MMU. -- Geoff Kuenning geoff@ITcorp.com {hplabs,ihnp4}!trwrb!desint!geoff
tim@ism780c.UUCP (Tim Smith) (04/21/87)
<OK, here's one that may not fit in 256 bytes. < <time bc <<f <2 ^ 4096 <f < <Make bct executable. Clear the screen (no scrolling please). Then run it. < <Real Time System/comments (ws = wait state(s)) < <0:03.6 Amdahl 580 3 users 2/86 ames!aurora!eugene <0:06.6 Gould UTX32 6/84 <0:07.6 Vax 8600 running 4.3BSD Beta, 16 Feb 1986 <0:09.5 u Sun 3/260 68020 "25 Mhz" BSD 4.3 Unix-EXPO 10/86 <0:14.5 u HP 9000/840 Spectrum RISC HP-Unix Unix-EXPO 10/86 <0:15 DG/UX 2.01, DG MV/10000SX, 8MB <0:15.7 u Compaq 386 80386 Xenix 5 Unix-EXPO 10/86 <0:15.8 u Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1) <0:20 Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86 (1) <0:17.6 u Corvus 386 80386 Xenix 5 Unix-EXPO 10/86 I just tried this. Under V.3 on a 16Mhz 386 ( Intel production board, with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6 system. bc was compiled with rcc. There were other users on the system, but they weren't doing anything significant. It appears to me that the 386 and the 68020 are about the same. To an end user, does it really matter which is in the box? Just for fun, I tried this on an AT&T 6300+ running V.3 in 1 meg of ram. The results were 46.1 real, 40.3 user, 3.6 sys. If anyone wants to give me a cray, I will try it there too. :-) -- Tim Smith "Hojotoho! Hojotoho! uucp: sdcrdcf!ism780c!tim Heiaha! Heiaha! Delph or GEnie: Mnementh Hojotoho! Heiaha!" Compuserve: 72257,ar C
geoff@desint.UUCP (Geoff Kuenning) (04/22/87)
In article <318@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: > Even stranger, about 35-40% of the total cycles are in multiply, divide, > and remainder, a behavior pattern found in no other benchmark that Aha! That explains my results last night, when I (just for grins) compared a 16-MHz 386 with a creaky old 10 MHz 68010. The 386 beat the '010 out by a factor of almost 10. Why so much? Because my '010 compiler uses subroutines for multiply/divide/remainder on ints. I feel so much better. :-) -- Geoff Kuenning geoff@ITcorp.com {hplabs,ihnp4}!trwrb!desint!geoff
ihm@nrcvax.UUCP (04/22/87)
>Posted: Thu Apr 16 17:56:01 1987 > >In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: >> >> If all you want to do is calculate sieves all day then use the '020. But >> if you want to do real crunching on large problems then the 386 will run >> circles around the '020. That is not to say the '020 with the 256 byte >> cache does not have its niches. There is a number of application in the >> embedded control area that have inner loops that fit nicely in 256 bytes. I suppose this guy thinks Motorola designed the cache specifically for benchmarks? Caches have long been used to improve performance in real large-scall applications. Unless you program with NO loops, an instruction cache will provide a very real benefit to real applications. > >This is exactly the kind of stuff one would expect to read in a posting >with this in the header: >> From: tomk@intsc.UUCP (Tom Kohrs @fae) >> Organization: Intel Sleeze, Silicon Valley, Ca. > ^^^^^^^^^^^^ >And it is exactly the kind of posting that should only be published in >/dev/null. > >keith > Yep, and do you know who claimed that the 286 was of comparable speed to the 68020? That's right. I have in my office two 286 based IBM toys and in the computer room there are two 68020 (only 16.67 MHz) unix machines. Do you want to know which machines I (and about 20 other people) use? That's right, the 68020's. They are on the order of 20 times as fast in our applications such as database maintenance, editing, and more like 60 times as fast at compilations. Regardless of whether the cache is a contributing factor or not, the total throughput is vastly superior to the 286, and from what I am hearing and reading, the 386's real throughput in real applications is only on the order of 3 to 6 times a 286. The 386 maybe able to run circles around the 68020, but can it outcompute it? The 68020 doesn't do much running it just sorta sits there in its socket computing... (:->) <>IHM<>
caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/22/87)
In article <6011@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes:
:<0:15.7 u Compaq 386 80386 Xenix 5 Unix-EXPO 10/86
:<0:15.8 u Computer Dynamics 18mHz 386 SCO SYSV 2.2 3/87 (1)
:<0:20 Computer Dynamics 18mHx 386 Toolkit (16 bit bc/dc) 12/86 (1)
:<0:17.6 u Corvus 386 80386 Xenix 5 Unix-EXPO 10/86
:
:I just tried this. Under V.3 on a 16Mhz 386 ( Intel production board,
:with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
:system. bc was compiled with rcc. There were other users on the
:system, but they weren't doing anything significant.
:
Assuming the "Intel Production Board" you refer to is the "386 AT Clone"
motherboard, your 11.2 second user time represents a 100 per cent
improvement in code execution speed over Xeix code running on the 386
toolkit (factoring the 18 mHz vs 16 mHz speed).
I trust this "rcc" is real software that I'll be able to see running soon.
Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf Omen Technology Inc "The High Reliability Software"
17505-V Northwest Sauvie Island Road Portland OR 97231 Voice: 503-621-3406
TeleGodzilla BBS: 621-3746 2400/1200 CIS:70007,2304 Genie:CAF Source:TCE022
omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly
ihm@nrcvax.UUCP (04/23/87)
>In article <866@oakhill.UUCP> davet@oakhill.UUCP (Dave Trissel) writes: > >> If the 386 only supports three register variables and in this tiny benchmark > >As usual, Dave is right on target. Pcc for the 386 does indeed only allow >three register variables. Furthermore, I am currently involved in writing >an extremely carefully hand-coded assembly loop (bitblt, if you care) for >the 386. No matter what you do, you can't get more than seven registers >for your use on the 386. And even disregarding that, the code on the 386 >is *much* uglier than the equivalent 68k (not '020) code. For example, >the 386 only has one register (esi) that can do autoincrement loads, and >one that can do autoincrement stores. [Lotsa really UGLY intel architectural restrictions omitted] > [...] I would *much* rather have >the MMU built into the '020 than have the extra instructions you guys added >(not that I don't like them too, but I think the MMU is much more important). >And yes I realize that there are many applications that don't need an MMU. Yeah, but try coding up your tight bitblit using bitfield instructions on the '020. I think you will find that it is MUCH shorter and faster than the 68000 version. I seem to recall a coworker doing exactly this, for a general purpose bitblt, using the '86, the 68k and the 68020, (ignoring for the sake of comparison the ridiculous address space limitations on the 80/1/286 (386 didn't exist yet), used small model). The hand optimized code on the '86 was something like 12 times as many instructions as for the 68K which was only about 2.5 to 3 times as large as for the '020. I don't recall the 68000 and 68010 perfomance comparison, but the 68020 did this about 120 times as fast as a 186. The 286 was only about 3 times a 186, and the preliminary numbers Intel had supplied us for the 386 were only about 3 or 4 times the 286. I would have prefered if the MMU was just made available sooner; not necessarily on the same chip. It will be interesting to see what happens with the 68030 MMU now that it IS on chip. Cheerz-- --i
steve@edm.UUCP (Stephen Samuel) (04/23/87)
In article <523@omen.UUCP>, caf@omen.UUCP (Chuck Forsberg WA7KGX) writes: > In article <933@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes: > > :Show me a benchmark that does not fit in 256 bytes thats even keeps up > ^^^^^^^^^ (note for ref.) . . . . > The Gentleman from Intel claims siev has a "hot spot" that fits within the > 256 byte 68020 cache. .... That's exactly the point!!! Why do you think Motorola put the d'#n thing IN? LOTS of program have 'hot spots' (remember the 90/10 rule?????!) and an instruction cache is designed to take advantage of this fact. If you have to look so hard for programs that DON'T have such hot-spots, then obviously Motorola was on the right track when they put in a cache. Don't send US on a wild goose chase to fine a benchmark that gives the '386 a snowball's chance -- It's YOUR crusade -- YOU find the goose. "We get enough 'snow' related tasks up here in Canada, as it is. I intend to enjoy the summer now that it's here." -- ------------- Stephen Samuel {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve
tim@ism780c.UUCP (Tim Smith) (04/23/87)
In article <526@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes:
:: I just tried this. Under V.3 on a 16Mhz 386 ( Intel production board,
:: with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6
:: system. bc was compiled with rcc.
::
: Assuming the "Intel Production Board" you refer to is the "386 AT Clone"
: motherboard, your 11.2 second user time represents a 100 per cent
: improvement in code execution speed over Xenix code running on the 386
: toolkit (factoring the 18 mHz vs 16 mHz speed).
Nope, it is not a 386 AT clone. It is a Multibus board. I wonder if the
bus on it is twice as wide as AT clone bus? That would explain the factor
of two.
: I trust this "rcc" is real software that I'll be able to see running soon.
It appears to be real. It is the compiler that whoever sells V3 for
the 386 will probably be selling. As for who will be selling V3, that
is a mystery to me! I would guess AT&T or Intel, but I have no idea.
--
Tim Smith "Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim Heiaha! Heiaha!
Delph or GEnie: Mnementh Hojotoho! Heiaha!"
Compuserve: 72257,3706
caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/26/87)
In article <6048@ism780c.UUCP> tim@ism780c.UUCP (Tim Smith) writes: :In article <526@omen.UUCP> caf@omen.UUCP (Chuck Forsberg) writes: ::: I just tried this. Under V.3 on a 16Mhz 386 ( Intel production board, ::: with a 387, which shouldn't matter ), I get 12.6 real, 11.2 user, .6 ::: system. bc was compiled with rcc. ::: :: Assuming the "Intel Production Board" you refer to is the "386 AT Clone" :: motherboard, your 11.2 second user time represents a 100 per cent :: improvement in code execution speed over Xenix code running on the 386 :: toolkit (factoring the 18 mHz vs 16 mHz speed). : :Nope, it is not a 386 AT clone. It is a Multibus board. I wonder if the :bus on it is twice as wide as AT clone bus? That would explain the factor :of two. Nope, the Intel 386 AT cloneboard uses 32 bit data paths to the 2.5 MB of ram (512k on the board, 2 MB on an Intel dual banked 32 bit plug in).