[comp.unix.xenix] XENIX 386 benchmark results

dyer@spdcc.COM (Steve Dyer) (06/30/87)

Now that I have results for XENIX 386 along with my earlier figures,
this is a final posting of benchmarks of the Intel Inboard 386/AT,
for both XENIX 286 running in 16-bit memory and 32-bit memory and
XENIX 386 running in 32-bit memory.

At least from the Dhrystone benchmarks reported below, we're well
into Sun 3 territory, if not beyond!  Quite amazing...
---------------------------------------
All tests were run on the same hardware and software environment,
with the exception of the replacement of the 286 for the 386 card,
under SCO XENIX 286 OS v. 2.1.3 with Development System v. 2.1.4
or SCO XENIX 386 OS and Development System beta test with beta update (6/16/87).

	      IBM PC/AT 8mhz	  IBM PC/AT with Intel Inboard 386/AT
					at 16mhz, cache enabled
		XENIX 286	XENIX 286	XENIX 286	XENIX 386
		16-bit mem	16-bit mem	32-bit mem	32-bit mem

Drystone 1.0	no reg	reg	no reg	reg	no reg	reg	no reg	reg
		1278	1292	2293	2304	3429	3405	5259	5719
Drystone 1.1	1084	1094	1957	1963	2906	2893	4603	4922

Buchholz (sum of user & sys times in sec)

short cpu	0.3		0.2		0.1		0.1
medium cpu	3.3		1.9		1.2		0.5
long cpu	***		*** (values out of range)	2.6
short I/O	0.9		0.6		0.4		0.1
I/O bound	3.1		1.9		1.4		0.5
long mixed	56.9		33.8		21.8		8.9
-- 
Steve Dyer
dyer@harvard.harvard.edu
dyer@spdcc.COM aka {ihnp4,harvard,linus,ima,bbn,m2c}!spdcc!dyer

braun@m10ux.UUCP (07/01/87)

After seeing the various benchmarks for 80286, 80386, and 68020's,
I have been wondering: What sizes are the integers used by each machine?
I assume that a 80[012]86 is using 16 bit integers.  I would also suspect
taht a 68020 is benchmarked with 32 bit ints, since that's what you get
from Sun and the other 68020 workstation makers.
What about the 80386?  Are most of these benchmarks done running
the same binary on a '386 as used on the '286?  If so,
what happens to the '386's speed when it runs with 32 bit integers?

By the way, what support does the '386 have for 32 bit arithmetic
and 32 bit addressing?  Is it a new bunch of instructions, or
a different mode for the cpu?
-- 

Doug Braun		AT+T Bell Labs, Murray Hill, NJ
m10ux!braun		201 582-7039

dyer@spdcc.COM (Steve Dyer) (07/03/87)

In article <225@m10ux.UUCP>, braun@m10ux.UUCP writes:
> After seeing the various benchmarks for 80286, 80386, and 68020's,
> I have been wondering: What sizes are the integers used by each machine?
> I assume that a 80[012]86 is using 16 bit integers.  I would also suspect
> taht a 68020 is benchmarked with 32 bit ints, since that's what you get
> from Sun and the other 68020 workstation makers.  What about the 80386?
> Are most of these benchmarks done running the same binary on a '386 as
> used on the '286?  If so, what happens to the '386's speed when it runs
> with 32 bit integers?
 
At least for "dhrystone", all integers are declared as "int", meaning that
the compiler chooses the natural size for its target architecture.
For XENIX 286 cc, int == 16 bits and for XENIX 386 cc, int == 32 bits.
Maybe my report wasn't clear enough (I thought it was), but I reported
results for both 286 objects and 386 objects running on the Intel 386
as well as baseline results for the 286 objects running on an 8mhz 286.

> By the way, what support does the '386 have for 32 bit arithmetic
> and 32 bit addressing?  Is it a new bunch of instructions, or
> a different mode for the cpu?

This is interesting.  First, both the 286 and 386 execute the same instructions
(the 386 has some extensions and new addressing modes, but this is generally
a true statement) but on the 386, the operand size can vary.  The 386's
registers are analogous to the 8086, but the "general registers" (if they
can be called that) are potentially 32 bits wide.  In real mode (and virtual-86
mode) the operand and address sizes are 16 bits by default, just like an 8086
or 286.  In protected mode, the address and operand sizes are determined by the
D-bit in the segment descriptor for the code currently executing.  If D==0,
then 16-bit addressing and operands are the default; if D==1, then all
addresses and operands are taken to be 32 bits wide.  There are two opcode
prefixes which invert the current sizes of addresses and operands for that
instruction.  These prefixes are most useful in a real mode DOS environment
to force 32-bit operations.  They're generally unnecessary in a protected mode
environment, since the OS sets your code segment selector appropriately.
I could imagine some hacks to get 16-bit behavior in a 32-bit environment,
but they'd be certainly unusual.
-- 
Steve Dyer
dyer@harvard.harvard.edu
dyer@spdcc.COM aka {ihnp4,harvard,linus,ima,bbn,m2c}!spdcc!dyer

jfh@killer.UUCP (John Haugh) (07/03/87)

In article <127@spdcc.COM>, dyer@spdcc.COM (Steve Dyer) writes:
> ...
> At least from the Dhrystone benchmarks reported below, we're well
> into Sun 3 territory, if not beyond!  Quite amazing...
> 
> 	      IBM PC/AT 8mhz	  IBM PC/AT with Intel Inboard 386/AT
> 					at 16mhz, cache enabled
> 		XENIX 286	XENIX 286	XENIX 286	XENIX 386
> 		16-bit mem	16-bit mem	32-bit mem	32-bit mem
> 
> Drystone 1.1	no reg	reg	no reg	reg	no reg	reg	no reg	reg
>             	1084	1094	1957	1963	2906	2893	4603	4922
[ some munging to get rid of unneeded 1.0 kruft ]

I really don't know about the claim that the 386 is now in Sun territory.
I just benchmarked a Plexus P/95 (Yes, I know the list price is up arround
$100K) and it came out somewheres near 5200 Dhrystones at 20Mhz.  The
25Mhz box we bought should be over 6000.  Hopefully Guy can get his Sun's
to do alittle better than they have been doing.

We should be getting our box in sometime this week.  I finally got a
system built the way *I* wanted rather than what the boss wanted to spend
on one.  Dual disks, plenty RAM, spare serial ports, the works.  I just
hope they can still afford to give me a raise next year :-) :-) :-).

And just for kicks, I bounced this into comp.arch where it might be
interesting for all of those RISC'y people to see ...

And by the way - Xenix is not just an operating system for PC's.  Tandy
runs it on 68000's, I don't know about anyone else though ...

- John.

john@bby-bc.UUCP (john) (07/03/87)

> then 16-bit addressing and operands are the default; if D==1, then all
> addresses and operands are taken to be 32 bits wide.  There are two opcode
> prefixes which invert the current sizes of addresses and operands for that
> instruction.  These prefixes are most useful in a real mode DOS environment
> to force 32-bit operations.  They're generally unnecessary in a protected mode
> environment, since the OS sets your code segment selector appropriately.
> I could imagine some hacks to get 16-bit behavior in a 32-bit environment,
> but they'd be certainly unusual.

So if you want to do 16 and 32 bit arithmetic in the same procedure all the
instructions with operands of one of the sizes has to be prefixed?  Do any
of the existing compilers do this?

Are there instuctions for 16->32 and vice versa conversions along the
lines of CBW?

Assuming 32 bit wide memory is there any unusual speed penalties for
particular instuctions with 16/32 bit operands, e.g. are shifts faster
with one particualr size?

mash@mips.UUCP (John Mashey) (07/05/87)

In article <1090@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:
....on 386s getting into SUn-3 territory...
>I really don't know about the claim that the 386 is now in Sun territory.
>I just benchmarked a Plexus P/95 (Yes, I know the list price is up arround
>$100K) and it came out somewheres near 5200 Dhrystones at 20Mhz.  The
>25Mhz box we bought should be over 6000....

>And just for kicks, I bounced this into comp.arch where it might be
>interesting for all of those RISC'y people to see ...

Hmmm.  You might want to read Rick's current Dhrystone lists.

I realize my login machine does only a "wimpy" 10-12K Dhrystones,
but I've got terminal sessions going on this instant on RISC micros
that do 18-22K Dhrystones, and they are NOT wimpy [about 5 minutes CPU
time and <13 minutes real time for full 4.3+NFS kernel build from scratch.
This is slower than the Amdahl "3 minutes".]
SunRise / SPARC /Sun-4 should be announced this week,
and they ought to be over 20K Dhrystones, too.

Rick's end-of-July issue should be interesting: at least 2 different
RISC microprocessors will be on the list FASTER than IBM 3081s,
CRAY X-MPs [to be fair, not built for Dhrystone:-)].  They will be
slower than IBM 3090s and Amdahl 5860s...this year...
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

kds@mipos3.UUCP (07/08/87)

In article <130@bby-bc.UUCP> john@bby-bc.UUCP (john) writes:
>So if you want to do 16 and 32 bit arithmetic in the same procedure all the
>instructions with operands of one of the sizes has to be prefixed?

yes...

>
>Are there instuctions for 16->32 and vice versa conversions along the
>lines of CBW?

yes, both sign extend and zero extend are provided

>Assuming 32 bit wide memory is there any unusual speed penalties for
>particular instuctions with 16/32 bit operands, e.g. are shifts faster
>with one particualr size?

nope, the rate of instruction execution shouldn't change.  Of course, it takes
a clock to crack a prefix, so if you have lots of them, performance could
suffer.  My guess as to why 32-bit code seems to run so much faster than 16-bit
code on the 386 has to do with the differences in the programming model between
16-bit and 32-bit code: 32-bit code is always "small" mode (i.e., no segment
register reloads), can do 32-bit arithmetic operations in a single instruction,
and register usage is more general in 32-bit code.
-- 
The above views are personal.

...and they whisper and they chatter, but it really doesn't matter.

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

caf@omen.UUCP (Chuck Forsberg WA7KGX) (07/11/87)

In article <826@mipos3.UUCP> kds@mipos3.UUCP (Ken Shoemaker ~) writes:
:and register usage is more general in 32-bit code.
Amen.  One can finally index off the stack pointer, saves all that "push bp"
drudgery.  Lo and Behold:


	page	64,132
	TITLE YAM Support Functions for Intel 386
	SUBTTL Copyright 1987 Omen Technology Inc
	name UP
;  rev 7-4-87

	.386


_TEXT	SEGMENT BYTE PUBLIC	'CODE'
	ASSUME	CS: _TEXT

	public	_pareven
	public	_getstk

;
; pareven(c) returns byte value of c with even parity
;
;
_pareven	proc	near
	sub	eax, eax
	add	al, [esp+4]
	jpe short	even1
	xor	al, 80h
even1:	ret
_pareven	endp

;
;
; char * getstk returns current value of stack pointer
;
;
_getstk	proc	near
	mov	eax, esp
	ret
_getstk	endp

_TEXT	ends
	end

Granted these aren't the most earthshaking of routines, but 286 hackers
will appreciate how much cleaner things are on the 386.  

Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf  Omen Technology Inc "The High Reliability Software"
  17505-V Northwest Sauvie Island Road Portland OR 97231  Voice: 503-621-3406
TeleGodzilla BBS: 621-3746 2400/1200  CIS:70007,2304  Genie:CAF  Source:TCE022
  omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
  omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly

robert@pvab.UUCP (Robert Claeson) (07/13/87)

In article <1090@killer.UUCP> jfh@killer.UUCP (John Haugh) writes:

>And by the way - Xenix is not just an operating system for PC's.  Tandy
>runs it on 68000's, I don't know about anyone else though ...

I think Ohio Scientific runs a hacked version of Xenix on 68000's too.

-- robert
-- 
SNAIL:	Robert Claeson, PVAB, P.O. Box 4040, S-171 04 Solna, Sweden
UUCP:	{seismo,mcvax,munnari}!enea!pvab!robert
ARPA:	enea!pvab!robert@seismo.arpa

gnu@hoptoad.uucp (John Gilmore) (07/13/87)

kds@mipos3.UUCP (Ken Shoemaker) wrote:
>         My guess as to why 32-bit code seems to run so much faster than 16-bit
>code on the 386 has to do with the differences in the programming model between
>16-bit and 32-bit code: 32-bit code is always "small" mode (i.e., no segment
>register reloads), ...
> -- 
> The above views are personal.

I construe this as a "personal" statement by an Intel chip designer
that the whole concept of memory models and segment registers is wrong
-- not only did it make life hell for programmers [who bought Intel
machines], but he points out that the generated code using those models
is much slower, even on their whizziest new chip.

The statement that 32-bit code is always "small mode" belies the
waffling in the 386 architecture manuals that extolls the virtues of
segment registers even in a 32-bit address space.  They should have
been honest enough to describe them as a hack for 8086 compatability,
useless in other modes.

Welcome to reality, Intel!  Glad to have you with us.  And thanks for fixing it.
-- 
{dasys1,ncoast,well,sun,ihnp4}!hoptoad!gnu	     gnu@postgres.berkeley.edu
Alt.all: the alternative radio of the Usenet.

ps@diab.UUCP (Per-Erik Sundberg) (07/14/87)

In article <201@pvab.UUCP> robert@pvab.UUCP (Robert Claeson) writes:
>I think Ohio Scientific runs a hacked version of Xenix on 68000's too.
They run D-NIX, which earlier was inspired by Xenix, but now
has joined the SVID-compatible bandwagon.

-- 
Per-Erik Sundberg,  Diab Data AB
SNAIL: Box 2029, S-183 02 Taby, Sweden
ANALOG: +46 8-7680660
UUCP: seismo!mcvax!enea!diab!ps

campbell@sauron.Columbia.NCR.COM (Mark Campbell) (07/15/87)

In article <225@diab.UUCP> ps@.UUCP (Per-Erik Sundberg) writes:
>In article <201@pvab.UUCP> robert@pvab.UUCP (Robert Claeson) writes:
> [...]

Speaking of Xenix...

Does anyone know why the 80286 and 80386-based Xenix machines perform the
AIM 2.0 forks per second test so well?  I've been seeing numbers lately
of between 90 and 120 forks per second on several PC's.  I'm wondering if
the high numbers are a result of the 80x86 architecture, the Xenix kernel
implementation of fork, the libraries, compiler, etc.  Thanks.
-- 
						Mark Campbell
						{}!ncsu!ncrcae!sauron!campbell