[net.micro.68k] Need 286 "C" benchmark

davet@oakhill.UUCP (Dave Trissel) (05/17/85)

Computer Architecture Fans:

Could you help us in an experiment?  We need to run the following benchmark
on 286 and 68k machines.  If you have the Intel 310 sytem or the IBM PC/AT
(or any other 286 system) we would greatly appreciate the results.

The benchmark comes from a description in Doug Hoffsteder's "Godel, Escher,
Bach".

[Thanks to Charles River Data Systems for providing the "C" source.]

Dave Trissel           {ihnp4,seismo,gatech,ctvax}!ut-sally!oakhill!davet
Motorola Semiconductor Inc.  Austin, Texas
-----------------------------------------------------------------------

	Below is a source listing of a C program.  The program takes each
integer from 1 to 999 and checks to see if it is even or positive.  If the
number is even it divides it by 2, if it is odd the number is multiplied by
3 and then 1 is added.  Then back to the even-odd test with the mentioned
changes until the number is one.  After that the next number in the 1 to 999
range is manipulated and so one.  After this is completed the next part of
the program does the same thing, except this time it keeps a histogram of all
the values of the number.


#include <stdio.h>
main()
{
	register int i, k, j;
	register unsigned int max = 0;
	for (i=1; i < 1000; ++i) {
		k= i;
		while (k != 1) {
			if (max < k) max = k;
			j = k/2;
			j = j*2;
			if (j == k)
				k = k/2;
			else	k = k*3 + 1;
		}
	}
	hst(max);
}
hst(max)
	register unsigned int max;
{
	register int i, k, j;
	short *h = calloc(max+1,sizeof(short));
	for (i=1; i < 1000; ++i) {
		k= i;
		while (k != 1) {
			++(h[k]);
			j = k/2;
			j = j*2;
			if (j == k)
				k = k/2;
			  else    k = k*3+1;
		}
	}
}
-----------------------------------------------------------------------

gordonl@microsoft.UUCP (Gordon Letwin) (05/20/85)

I just love the contact sport of "combative benchmarking".  I note how
the source code for the Hofstader (sp?) benchmark just accidentally
happens to declare its register variables from the least-used to the
most used, the opposite of normal C convention.  And by coincidence,
there are three of those little hummers... and we're comparing a
68K with >3 regvars against a 286 with only 2!
This means that the single most heavily used register variable will
be in a reg on the 68K and on the frame for a 286.  My my, what a
terrible accident.

Those who assume that I'm somehow "defending" the 286 architecture should
practice their reasoning, logic, and debate a bit more.  My points are
simple, and two-fold:

	1) Many or most of these benchmarks are done with folks with
	   an axe to grind.  There are very clever ways to grind axes.
	   This is an equal-oportunity sport: I've seen
	   some very "clever" and highly misleading benchmarks
	   perpetrated by all the majors.

	2) None of this has any meaning, within the realm of the religious
	   battle being fought here.  There is no tricky little benchmark
	   that will cause IBM to drop the 286.  There is no tricky little
	   benchmark that will cause SUN to drop the 68K, etc.  By definition,
	   if there is someone who might choose their machine on the
	   basis of such benchmarks, that company is very late and will
	   have a very small impact on the world.  Machine performance
	   as a religion - convince the other guy you're right using all
	   means, fair and foul - is a dead issue.  Machine performance
	   as a science - don't decree what's right, FIND OUT what's
	   right - could still be of interest, from a theoretical standpoint.
	   I've seen very little of this on the net, though.

gordon letwin
microsoft

personal opinions, of course, not company.  MS doesn't "like" or "dislike"
chips, we just work with 'em, for good or bad, always some of each.
	   

cem@intelca.UUCP (Chuck McManis) (05/22/85)

> I just love the contact sport of "combative benchmarking".  I note how
> the source code for the Hofstader (sp?) benchmark just accidentally
> happens to declare its register variables from the least-used to the
> most used, the opposite of normal C convention.  And by coincidence,
> there are three of those little hummers... and we're comparing a
> 68K with >3 regvars against a 286 with only 2!
> This means that the single most heavily used register variable will
> be in a reg on the 68K and on the frame for a 286.  My my, what a
> terrible accident.
> 
It is also by "accident" that of those three variables j, k, and max are
"assumed" to be 32 bits. ("Oh, did I leave that out?") And that the only
purpose of the histogram seems to be to try to allocate an array that has
250504 elements. I am sure Sun Tzu has some apropos comment for the 
situation.

--Chuck
"I work with 'em and I like it."

*** REPLACE THIS LINE WITH YOUR BENCHMARK ***
-- 
                                            - - - D I S C L A I M E R - - - 
{ihnp4,fortune}!dual\                     All opinions expressed herein are my
        {qantel,idi}-> !intelca!cem       own and not those of my employer, my
 {ucbvax,hao}!hplabs/                     friends, or my avocado plant. :-}

lotto@talcott.UUCP (Jerry Lotto) (05/22/85)

> I just love the contact sport of "combative benchmarking".  I note how
> the source code for the Hofstader (sp?) benchmark just accidentally
> happens to declare its register variables from the least-used to the
> most used, the opposite of normal C convention.  And by coincidence,
> there are three of those little hummers... and we're comparing a
> 68K with >3 regvars against a 286 with only 2!...
> 
> gordon letwin
> microsoft
>
	I agree... note the SIZE of the hardware registers we are using.
This would not be a problem, but most micro C compiler produced code will
try to cram the decimal number 250500+ into those 16 bit ints. And then
to do a calloc of this many words...
-- 
____________

Gerald Lotto - Harvard Chemistry Dept.

UUCP:  {genrad,cbosgd}!wjh12!h-sc4!harvard!lhasa!lotto
       {seismo,harpo,ihnp4,linus,allegra,ut-sally}!harvard!lhasa!lotto
ARPA:  lotto@harvard.ARPA
CSNET: lotto%harvard@csnet-relay

seth@megad.UUCP (Seth H Zirin) (05/23/85)

> I just love the contact sport of "combative benchmarking".  I note how
> the source code for the Hofstader (sp?) benchmark just accidentally
> happens to declare its register variables from the least-used to the
> most used, the opposite of normal C convention.  And by coincidence,
> there are three of those little hummers... and we're comparing a
> 68K with >3 regvars against a 286 with only 2!...
> 
> gordon letwin
> microsoft

It would seem to me that the 68K offers more facilities for writing fast
code (e.g. more regvars).  But then, I wouldn't use an Intel uP if it had 100 
registers (sorry Intel :-) ).  If the 68K is a tricycle, what is a 286?
-- 
-------------------------------------------------------------------------------
Name:	Seth H Zirin
UUCP:	{decvax, ihnp4}!philabs!sbcs!megad!seth

Keeper of the News for megad

davet@oakhill.UUCP (Dave Trissel) (05/24/85)

In article <583@intelca.UUCP> cem@intelca.UUCP (Chuck McManis) writes:
>> [quoting someone else...]
>> I just love the contact sport of "combative benchmarking".  I note how
>> the source code for the Hofstader (sp?) benchmark just accidentally
>> happens to declare its register variables from the least-used to the
>> most used, the opposite of normal C convention.  And by coincidence,
>> there are three of those little hummers... and we're comparing a
>> 68K with >3 regvars against a 286 with only 2!
>> This means that the single most heavily used register variable will
>> be in a reg on the 68K and on the frame for a 286.  My my, what a
>> terrible accident.

When I posted the benchmark I was not aware of all that.  But what's the
complaint? Are you saying that its not fair to use registers since one
chip only has 2 of them?   In the real world programs would use a lot more
than two registers.  Why are you trying to hide architectural weaknesses?
Benchmarks should be just the thing to point out such weaknesses.

By your analogy no benchmark run between an Intel vs <whatever> machine should
have any statements such as the following:

			   I = J;

because the 808x et. al. do not have a memory to memory scalar move and would
thus be artificially handicapped.  That wouldn't be fair to Intel now, would
it?

>It is also by "accident" that of those three variables j, k, and max are
>"assumed" to be 32 bits. ("Oh, did I leave that out?") And that the only
>purpose of the histogram seems to be to try to allocate an array that has
>250504 elements.

I find this highly ironic coming from an Intel person.  Intel's latest
benchmark booklet comparing the 286 with the 68k just happens to be full of
C programs which have ints.  Intel doesn't bother telling anyone that the
68k versions all run with 32-bit integers while the 286 gets by with 16 bit
integers.  Deliberate deception - but we all know why.

This quibbling is all very telling.  If Intel advertizes that the 286 is not
only far better than the several years old MC68000 but matches the speed of
the new MC68020 one would think that these itty-bitty benchmarks certainly
couldn't cause a problem.  After all, every M68000 chip from day one easily
chews them up. So what's the hangup here?  If you have to go to LONGs then
do it.  But don't sit and gripe if you chip can't hack it.

As for the large array, I have compiled the program on my Macintosh at home.
No sweat. It runs easily on a 1 Meg Lisa (Mac XL.)  Why is it such a big deal
to run it on a 286 (which supposedly rivals the MC68020?)

>
>--Chuck
>"I work with 'em and I like it."
>
>*** REPLACE THIS LINE WITH YOUR BENCHMARK ***

Ok I will.  Here's another dinky benchmark which I just compiled and ran on
my Macintosh.  Lets hear some 286 times for it (and no excuses please.)

int a[50000];

main()
{
  int i;
  for (i=0; i<50000; i++) a[i+1] = a[i];
}

Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
Motorola Semiconductor Inc.  Austin, Texas
"I work with 'em and mine works"

kds@intelca.UUCP (Ken Shoemaker) (05/26/85)

> int a[50000];
> 
> main()
> {
>   int i;
>   for (i=0; i<50000; i++) a[i+1] = a[i];
> }
> 
> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
> Motorola Semiconductor Inc.  Austin, Texas
> "I work with 'em and mine works"

Hmmm, once again Dave has submitted a benchmark that requires more than 64K
of data.  This continued harping on the issue seems to indicate to me that
maybe Dave realizes that for programs that require less than 64K of data
that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
he might not be saying this at all, and far be it for ME to try to read
between his lines of code.....I would like to see the 680{00,10,20} 
performance numbers and system configurations for these benchmarks, though,
just for internal curiousity.
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

henry@utzoo.UUCP (Henry Spencer) (05/26/85)

> Hmmm, once again Dave has submitted a benchmark that requires more than 64K
> of data.  This continued harping on the issue seems to indicate to me that
> maybe Dave realizes that for programs that require less than 64K of data
> that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
> he might not be saying this at all, and far be it for ME to try to read
> between his lines of code...

It is always possible to find special cases where inferior processors
outperform superior ones, as witness the recent brouhaha about the Z80
outrunning the VAX 780 on function calls.  One of the tricky parts of
benchmarking is deciding what constitutes a special case and what doesn't.

Fifteen years ago, >64KB of data would frequently have been classed as
a special case.  Today, perceptions have changed, and it is not out of
order to penalize the 286 because its performance drops like a rock when
data size goes past 64KB.  Dave's "continued harping" reflects the
importance of the issue.  Without this wart, the 286 would be no worse
than many other ugly processors that mankind copes with.  As it is,
the 286 makes a useful "one-chip PDP11", and copes well with special
dedicated jobs where data requirements are inherently modest, but is
a travesty as a general-purpose computing engine.

Realistic benchmark results for the 286 cannot list a single number as
the completion time for a benchmark.  The only honest way to describe
the thing's performance is by listing *both* small-model and large-model
times.  This accurately conveys the 286's high performance in restricted
cases and its dismally-bad performance in general cases.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

chuck@dartvax.UUCP (Chuck Simmons) (05/26/85)

> int a[50000];
> 
> main()
> {
>   int i;
>   for (i=0; i<50000; i++) a[i+1] = a[i];
> }
> 
> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet

I know I can't program in C, but doesn't this program have a small
bug in it?  When i is 49999, doesn't some random word of memory
get trashed by the assignment statement?

chenr@tilt.FUN (Ray Chen) (05/27/85)

In article <588@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes:
>> int a[50000];
>> 
>> main()
>> {
>>   int i;
>>   for (i=0; i<50000; i++) a[i+1] = a[i];
>> }
>> 
>> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
>Hmmm, once again Dave has submitted a benchmark that requires more than 64K
>of data.  This continued harping on the issue seems to indicate to me that
>maybe Dave realizes that for programs that require less than 64K of data
>that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
>he might not be saying this at all, and far be it for ME to try to read
>between his lines of code....

Hmmm.  An interesting example of creative bullshitting.  Instead of
saying, "Oh, God, we're so SLOW when we handle > 64K data..." you say
"We're so FAST when we handle < 64K data; we can even keep up with
our competition then..."

Defending Intel against the question, "Why should 286 users pay a
performance penalty if their programs require > 64K data?" might
look better than sniping at your competitor just because he has the
intelligence to point out an obvious architectural "feature".

	Ray Chen
	princeton!tilt!chenr

steiny@idsvax.UUCP (Don Steiny) (05/27/85)

**

	The  articles that this one follows up discuss the
fairness or unfairness of using a benchmark that uses more
than 64K of data.

	To access data in a range greater that 64K a 286 program
needs to load the segement discriptor table to find the base 
of the segment.  This requires TWO register loads for each access,
even if the correct table is already loaded.   To make matters
worse, there are no registers that are truly general purpose.

	In short, no matter how fast an Intel chip gets, 
the segmentation and the lack of general purpose registers
are going to continue to be a limiting factor (unless they change).

gus@Shasta.ARPA (05/27/85)

> Hmmm, once again Dave has submitted a benchmark that requires more than 64K
> of data.  This continued harping on the issue seems to indicate to me that
> maybe Dave realizes that for programs that require less than 64K of data
> that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
> he might not be saying this at all, and far be it for ME to try to read
> between his lines of code.....I would like to see the 680{00,10,20} 
> performance numbers and system configurations for these benchmarks, though,
> just for internal curiousity.
> -- 

I think the issue is valid, since most production programs use more than
64K of code and/or data space. Unfortunately, it is only small programs that
lend themselves readily to benchmarking.

Seriously, folks. This quibling over machine speeds gets you nowhere. The
68K and i86 architectures will continue to evolve. One month one will
reign as fastest, and the next month the other will. Unfortunately, companies
must choose early on, what processor they will put in their machines, and
once this decision is made, it is not easy to switch. You would not
expect to see IBM switching over to Motarola and ditching the thausands of
MS-DOS dependent programs any more than Apple would switch over to Intel.

The bottom line is which machine offers the best SOLUTION. The processor
inside should not matter to the end-user. After all, usability is usually
more dependent on software than on hardware.

ctk@ecsvax.UUCP (Tim Kelley) (05/28/85)

This is yet another comment on the fairness of benchmarks using >64k
data on the *86 chips. I do this on IBM-PC/AT's every day for floating
point work. Most of my data consists of matrices of doubles. I have
coded things like dot products and daxpys in assembler, and observe
that the penealty for using the large model is very small. For problems
for which both models can be used, the large model code is about 5%
slower if written correctly. However, the real limit is the 1meg that
one can use without going to protected mode. This is often a real pain.
	I run MS-DOS (not XENIX) and don't use protected mode. Does anyone
know what the speed penalty for using protected mode is? Looks like a
lot. I am not defending Intel, having to fool around with segments is
a waste of time. However, where was a system that worked with hardware
floats for <$3000 three years ago (hint: IBM).


-- 
C.T. Kelley  decvax!mcnc!ecsvax!ctk
Dept. of Math.    N.C. State U. Box 8205
Raleigh, N.C. 27695-8205,  919-737-7895

cem@intelca.UUCP (Chuck McManis) (05/28/85)

I really don't want to drag this whole discussion onto the net again and
won't. I will however correct Dave's misinterpretations of my orginal 
message and then let it rest. 

> In article <583@intelca.UUCP> cem@intelca.UUCP (Chuck McManis) writes:
> >> [quoting someone else...]
> >> I just love the contact sport of "combative benchmarking".  I note how
> >> the source code for the Hofstader (sp?) benchmark just accidentally
> >> happens to declare its register variables from the least-used to the
> >> most used, the opposite of normal C convention.  And by coincidence,
> >> there are three of those little hummers... and we're comparing a
> >> 68K with >3 regvars against a 286 with only 2!
> >> This means that the single most heavily used register variable will
> >> be in a reg on the 68K and on the frame for a 286.  My my, what a
> >> terrible accident.
> 
> When I posted the benchmark I was not aware of all that.  But what's the
> complaint? Are you saying that its not fair to use registers since one
> chip only has 2 of them?   In the real world programs would use a lot more
> than two registers.  Why are you trying to hide architectural weaknesses?
> Benchmarks should be just the thing to point out such weaknesses.
> 
Dave, I don't know why you think that by pointing out differences in 
architecture someone is "hiding" them. I don't believe the person I 
quoted was complaining, merely pointing out how the source you posted
from the book was poorly written. I think the same person and myself 
would be quite suprised that you "didn't know" that the benchmark you
posted seemed particularly aimed at blowing up 16 bit compilers.

> By your analogy no benchmark run between an Intel vs <whatever> machine should
> have any statements such as the following:
> 
> 			   I = J;
> 
> because the 808x et. al. do not have a memory to memory scalar move and would
> thus be artificially handicapped.  That wouldn't be fair to Intel now, would
> it?
> 
As above I think you misinterpreted his statement as an analogy, you can put
anything you want in you C programs. 

> >It is also by "accident" that of those three variables j, k, and max are
> >"assumed" to be 32 bits. ("Oh, did I leave that out?") And that the only
> >purpose of the histogram seems to be to try to allocate an array that has
> >250504 elements.
> 
> I find this highly ironic coming from an Intel person.  Intel's latest
> benchmark booklet comparing the 286 with the 68k just happens to be full of
> C programs which have ints.  Intel doesn't bother telling anyone that the
> 68k versions all run with 32-bit integers while the 286 gets by with 16 bit
> integers.  Deliberate deception - but we all know why.

This is probably the most disturbing comment, and the reason I even bothered
to reply. If you programmed in C back in the good ol' days, you to would 
assume ints were 16 bits. Any source code I write that doesn't, points this
out in the comments. My original message was trying to point out why this 
code seemed to be targeted at "killing" 16 bit machines. (It would also not
run on the PDP-11 and on any C compiler that defaulted to 16 bit ints.) If
you had pointed it out then I would have simply replaced the required ints
with longs. I believe it was situations like this that #define was created 
for. As for "getting by", I assume you consider it a feature that your
compiler drags along an extra 16 bits when you don't need it. When I need
long ints, I use long ints. How do define 16 bit numbers? short? and if
so what is a byte in your compiler? Finally, deliberate deception? Come on,
lets be serious. As I mentioned in an earlier message, don't worry about
our benchmarks, run some tests yourself. That is always the only way you 
will believe anything. 

> 
> This quibbling is all very telling.  If Intel advertizes that the 286 is not
> only far better than the several years old MC68000 but matches the speed of
> the new MC68020 one would think that these itty-bitty benchmarks certainly
> couldn't cause a problem.  After all, every M68000 chip from day one easily
> chews them up. So what's the hangup here?  If you have to go to LONGs then
> do it.  But don't sit and gripe if you chip can't hack it.
> 
I did switch them to longs and only pointed out your omission of the 
requirement for 32 bits. Even a note in the message to the effect of 
"by the way these vars need to be 32 bits."

> As for the large array, I have compiled the program on my Macintosh at home.
> No sweat. It runs easily on a 1 Meg Lisa (Mac XL.)  Why is it such a big deal
> to run it on a 286 (which supposedly rivals the MC68020?)
> 
Here we are discussing compilers again, Microsoft has yet to release a compiler
that can deal with large arrays, the 286 has a 1Gbyte virtual address space and
hence plenty of room. I personally can write the "benchmark" in assembly 
quite easily, again the COMPILER can't hack it but the chip can. 

> >*** REPLACE THIS LINE WITH YOUR BENCHMARK ***
> 
> Ok I will.  Here's another dinky benchmark which I just compiled and ran on
> my Macintosh.  Lets hear some 286 times for it (and no excuses please.)
> 
> int a[50000];
> 
> main()
> {
>   int i;
>   for (i=0; i<50000; i++) a[i+1] = a[i];
> }
> 
> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
> Motorola Semiconductor Inc.  Austin, Texas
> "I work with 'em and mine works"

Again you decide to break the compiler not the chip. Microsofts C cannot
declare an array larger than 64K, yet. Make it 25,000 and I will post it.
Other than that I'll have to wait for Microsofts new C compiler.

--Chuck
"Why do I even bother."




-- 
                                            - - - D I S C L A I M E R - - - 
{ihnp4,fortune}!dual\                     All opinions expressed herein are my
        {qantel,idi}-> !intelca!cem       own and not those of my employer, my
 {ucbvax,hao}!hplabs/                     friends, or my avocado plant. :-}

keithd@cadovax.UUCP (Keith Doyle) (05/28/85)

>> int a[50000];
>> 
>> main()
>> {
>>   int i;
>>   for (i=0; i<50000; i++) a[i+1] = a[i];
>> }
>> 
>> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
>> Motorola Semiconductor Inc.  Austin, Texas
>
>Hmmm, once again Dave has submitted a benchmark that requires more than 64K
>of data.  This continued harping on the issue seems to indicate to me that
>maybe Dave realizes that for programs that require less than 64K of data
>that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
>he might not be saying this at all, and far be it for ME to try to read
>between his lines of code.....I would like to see the 680{00,10,20} 
>performance numbers and system configurations for these benchmarks, though,
>just for internal curiousity.
>-- 
>Ken Shoemaker, Intel, Santa Clara, Ca.
>

Well, Dave's approach is certainly no worse than the Intel ad which is 
presented without several important OHBYTHEWAY's.  Intel's assumtion that
all programs and data are <64k is no better than Motorola's assumtion
that they're all >64k.  Realizing that this is a basic difference in
performance between the two processors, I'd like to see benchmarks that
address programs and data of sizes both <64k and >64k.  (of course then
we can write benchmarks that use 12 registers to make the Motorola look
good, and ones that use 2 to make the Intel look good.)

Keith Doyle
#  {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd

bob@anwar.UUCP (Bob Erickson) (05/28/85)

It seems to me that the only arena one can compare chips like the 68000 and
the 32032 are when the code and data size are each less than 64kb.

I'm currently porting a huge application program to the PC/AT and running
into brick walls every day due to the 286 archictecture.

	Things like:

		Can't have staticly declared data total more than 64kb.
		(What does one do with a huge yacc grammar ?)
		I know, this is really compiler dependent, but I haven't
		found a compiler which allows for this.
		
		I'll probably run 3 times slower than on a 68000 or 32032
		because I have to run with the large model compiler.

		Many compilers don't allow any one data structure (staticly
		or dynamically created) to be larger than 64kb.  Luckily
		Lattice does support this ability while Microsoft 3.0
		doesn't, for instance.

		I have to maintain multiple copies of my common library 
		routines to account for the different memory models i might 
		want to use.

		If i decide to use the mutliple model feature of some 
		compilers, then i have to go around informing the world 
		of what is big and what is little, what is near and what 
		is far.

	When Intel finally comes out with a full 32 bit chip, (From a 
	programmers viewpoint, not the address lines viewpoint) I'm 
	sure their advertising will change real quick, and they'll tell 
	us all how outmoded segment and special purpose registers
	really are.


"Oh how i love a parade...."

-- 


========================================================== Be

Company: 	HHB-Softron
		1000 Wyckoff Ave.
		Mahwah NJ 07430
		201-848-8000

UUCP address:	{ihnp4,decvax,allegra}!philabs!hhb!bob

ron@celerity.UUCP (Ron McDaniels) (05/29/85)

In article <588@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes:
>
>Hmmm, once again Dave has submitted a benchmark that requires more than 64K
>of data.  This continued harping on the issue seems to indicate to me that
>maybe Dave realizes that for programs that require less than 64K of data
>that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
>he might not be saying this at all, and far be it for ME to try to read
>between his lines of code.....I would like to see the 680{00,10,20} 
>performance numbers and system configurations for these benchmarks, though,
>just for internal curiousity.
>-- 
>It looks so easy, but looks sometimes deceive...
>
>Ken Shoemaker, Intel, Santa Clara, Ca.
>{pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds
>	
>---the above views are personal.  They may not represent those of Intel.

I just can't let this go by (Lord knows, I should)!

If 64k segments aren't a problem and the "large system model" is so
blasted good (if you like to go into interpretive mode when you
execute), why does the 386 have a 32-bit segment length? 64k segments
are architecturally stinko. I realize you still have to sell chips so
that you can pay the bills, but stop making silly comparisons. The 8086
family are great for controllers and the like but I wouldn't want my
sister to marry one!  The 68000 and the 320xx are *much* better
machines in general applications simply because you *can* have data
objects (should I mention code segments?) larger than 64k bytes.

You know, I can remember having the same kind of "discussions"
re: 8080 -vs- 6502. Great fun!!!!


R. L. (Ron) McDaniels
CELERITY COMPUTING
9692 Via Excelencia Way
San Diego, California 92126
(619) 271-9940
{decvax || ucbvax || ihnp4 || philabs}!sdcsvax!celerity!ron
				-or-
			  akgua!celerity!ron

"The above views represent ALL right thinking individuals and anyone disagreeing
with them is obviously not playing with a full deck". 

(a smiley face for all you humorousless nurds out there that have to have 
even the most obvious attempts at humor spelled out to them in painstaking
detail   ;>)

cdshaw@watmum.UUCP (Chris Shaw) (05/29/85)

The bottom line is which machine offers the best SOLUTION. The processor
>inside should not matter to the end-user. After all, usability is usually
>more dependent on software than on hardware.

This, of course,is the point. The 8086, because of segments and all that crap,
forces all but the most trivial users of the machine to put up with artificial
and highly visible restrictions. 

If the machine has only 64K of ram, it's not important. If you want the machine
to have more, then you will run into problems, assuming that you run programs
with large amounts of data.

The bottom line is that the SOLUTION offered by Intel is obsolete, since micro
computing has gone well beyond the 64K limit.


Chris Shaw         watmath!watmum!cdshaw  or  cdshaw@watmath
University    of   Waterloo
In doubt?  Eat hot high-speed death -- the experts' choice in gastric vileness !

darrell@sdcsvax.UUCP (Darrell Long) (05/29/85)

Here's a little program that makes a good benchmark.  It especially
exercises the CALL instruction, clearly on of the most used of all
instructions.

This program finds the Knight's tour on an n*n chess board, I
suggest you start with n=5.  The running times grow exponentially
in n.

I have run this program on 68010's (SUN), WE-3200x (3B-2) and VAXen
with interesting results.  Let's see how the 80286 fares.

#include <stdio.h>

#define	TRUE	1
#define	FALSE	!TRUE
#define	n	5
#define	n_sqr	n*n

int a[8]={2,1,-1,-2,-2,-1,1,2};
int b[8]={1,2,2,1,-1,-2,-2,-1};

int chess_board[n][n];

int count = 0;

main()
{ 
	int i,j;
	for (i = 0; i < n; i++)
		for (j = 0; j < n; j++)
			chess_board[i][j] = 0;
	chess_board[0][0] = 1;
	if (try(2,0,0))
		for (i = 0; i < n; i++)
		{ 
			for (j = 0; j < n; j++)
				printf("\t%d",chess_board[i][j]);
			printf("\n"); 
		}
	else
		printf("no solution in %d tries.\n", count);

}

try(i,x,y)
int i,x,y;
{ 
	int k,u,v,q_1;
	k = 0;
	do {
		count++;
		q_1 = FALSE; 
		u = x + a[k]; 
		v = y + b[k];
		if (((u < n) && (u >= 0)) && ((v < n) && (v >= 0)))
			if (chess_board[u][v] == 0) { 
				chess_board[u][v] = i; 
				if (i < n_sqr) 
				{ 
					q_1 = try((i + 1),u,v); 
					if (!q_1) chess_board[u][v] = 0; 
				} 
				else
					q_1 = TRUE; 
			};
	} 
	while ((!q_1) && (++k < 8)); 
	return(q_1);
}
-- 
Darrell Long
Department of Electrical Engineering and Computer Science
University of California, San Diego

USENET: sdcsvax!darrell
ARPA:   darrell@sdcsvax

jer@peora.UUCP (J. Eric Roskos) (05/29/85)

> Here we are discussing compilers again, Microsoft has yet to
> release a compiler that can deal with large arrays, the 286 has a
> 1Gbyte virtual address space and hence plenty of room.  I
> personally can write the "benchmark" in assembly quite easily,
> again the COMPILER can't hack it but the chip can.

You were going along so well there, I was going to ignore your grossly
irritating suggestion that C "short"s should be bytes, until you said this.

You can't really blame the compiler writers for being unable to generate
efficient code for a deficient architecture.  You give someone an intractable
problem, then complain that they can't solve it!

The reason you can write the benchmark in assembly quite easily but the
compiler can't is that doing it requires knowledge of the semantics of the
program that are unavailable to the compiler.  For example, you know when
you need to change your segmentation registers and when you can leave them
alone.  The compiler can in some cases, if it's a compiler like our Fortran
compiler that does complex flow analyses of the code; but there are very
few such compilers out there, and especially not for microcomputers, yet.
But it can't possibly do it in all cases.  In particular, it probably can't
do it in programs that make heavy and unstructured use of GOTOs.

I think someone who is well-versed in complexity theory can show that the
halting problem is equivalent to this segmentation-register-switching
problem (just replace an arbitrary segmentation register use with a halt
instruction), but that's not my specialty, so I will not try to do that
rigorously.

However, one other thing.  Someone asked in here awhile back, "why should
I care if it is hard for the compiler writers to write the compilers?"
Well, I've seen that firsthand.  It's because it increases the probability
that you'll get a compiler with bugs.  Now, anything, no matter how
difficult, gets solved if you wait long enough (if it's solvable); but
it's generally better to get it solved in a reasonable amount of time.

DISCLAIMER: the above are just my opinions.  They aren't necessarily
Perkin-Elmer's.  I don't even know if we use 808x's or not!
-- 
Full-Name:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	    "V'z bss gb gur Orezbbgurf, gb jngpu gur bavbaf
	     na' gur rryf!"  [Jryy, jbhyq lbh oryvrir Arj Wrefrl?]

rsellens@watdcsu.UUCP (Rick Sellens - Mech. Eng.) (05/29/85)

I think the subject that this discussion is running under says it all.
If you go looking for a benchmark "for the 286" you will select a small
memory program if you are an Intel fan, and a large memory program if
you are a Motorola fan. The flaw lies in trying to select a benchmark
for a machine, rather than an application.

To *reasonably* benchmark anything you need an idea of what the application
will be. A benchmark can then be written to test the particular features
that are important in the expected use. A machine that screams at integer
operations in a small memory space may or may not be any good at doing
floating point work in a large memory space.

Please try to test individual capabilities when you benchmark, and then
report the results as representative of *only* those capabilities. That
way maybe we can start to intelligently answer the questions about what
machines do what things faster than others.


Rick Sellens
UUCP:  watmath!watdcsu!rsellens
CSNET: rsellens%watdcsu@waterloo.csnet
ARPA:  rsellens%watdcsu%waterloo.csnet@csnet-relay.arpa

seth@megad.UUCP (Seth H Zirin) (05/29/85)

davet@oakhill.UUCP writes:

> Ok I will.  Here's another dinky benchmark which I just compiled and ran on
> my Macintosh.  Lets hear some 286 times for it (and no excuses please.)
> 
> int a[50000];
> 
> main()
> {
>   int i;
>   for (i=0; i<50000; i++) a[i+1] = a[i];
> }

This program exceeds the bounds of the array a when doing the final assignment
a[50000] = a[49999];  The upper bound on the loop should be 49999.  This aside,
LONG LIVE MOTOROLA!!!!!  Maybe one day an 80[infinity]86 will beat a 68000 and
then intel can start catching up to the 68020. :-)
-- 
-------------------------------------------------------------------------------
Name:	Seth H Zirin
UUCP:	{decvax, ihnp4}!philabs!sbcs!megad!seth

Keeper of the News for megad

g-inners@gumby.UUCP (05/29/85)

> I'm currently porting a huge application program to the PC/AT and running
> into brick walls every day due to the 286 archictecture.
> 	Things like: <long list>

An architecture has many more effects than realized.  These problems can
always be blamed (unfairly) on the compiler, but they are really
architectural limitations "showing through" the veil.  Architecture problems
also "show through" operating systems.

Lots of 360/370 limitations cropped up as arbitrary limits in various
high-level language implementations also.

I think it is fair to have benchmarks that exploit architectural weaknesses.
There is a reason why so many compilers for the 286 don't support large
data objects.  Because the architecture makes it hard/slow/ugly.

So don't dismiss the architectural debates as meaningless.  Today's
quible over instruction formats may show up as tommorow's compiler
limit or bug.
				-- Michael Inners

kds@intelca.UUCP (Ken Shoemaker) (05/29/85)

> Defending Intel against the question, "Why should 286 users pay a
> performance penalty if their programs require > 64K data?" might
> look better than sniping at your competitor just because he has the
> intelligence to point out an obvious architectural "feature".
> 
> 	Ray Chen
> 	princeton!tilt!chenr

This wasn't quite the point.  Rather, lots of the "usual" Unix utilities
run, no problem, with much less than 64K of text AND data, so for the
things you normally do, maybe, just maybe, a SLOWER clock speed 286
runs just as fast as the new, whizzy, 68020.  For a few large things,
then yes, you can have a problem.  Another point is that the 286
has memory protection on chip, and regardless of what you think of it,
if you are trying to make a multitasking/multiuser system, this has
to be a big cost advantage.  I have >tried< to use 68K Unix-lookalikes
on non-protected systems, and blammo, bad news.  Still another point
is that when the 386 is available, whenever that is, if it runs things
2-3X the 286 AND runs large programs just as easily and as fast as it
runs small programs, then what does this say about relative 
68020 performance?  (this probably isn't fair, is it?)  Makes you 
wonder about "ugly" architectures, etc.  I'd like to think it is pretty 
good for a chip that has been derided many times
here as "an overgrown 4004" or a high-performance vending machine.
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,omovax}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

terryl@tekcrl.UUCP () (05/29/85)

>> int a[50000];
>> 
>> main()
>> {
>>   int i;
>>   for (i=0; i<50000; i++) a[i+1] = a[i];
>> }
>> 
>> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
>> Motorola Semiconductor Inc.  Austin, Texas
>> "I work with 'em and mine works"

>Hmmm, once again Dave has submitted a benchmark that requires more than 64K
>of data.  This continued harping on the issue seems to indicate to me that
>maybe Dave realizes that for programs that require less than 64K of data
>that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.  Of course,
>he might not be saying this at all, and far be it for ME to try to read
>between his lines of code.....I would like to see the 680{00,10,20} 
>performance numbers and system configurations for these benchmarks, though,
>just for internal curiousity.


     OK, here are times for a 68010 system we use here at Tek running 4.2,
with ONE wait state for the memory subsystem. This was on an unloaded system.
Do whatever you want with the times. The compiler was the Greenhills compiler
for the 68000.

1.2u 0.1s 0:01 99% 0+16k 0+0io 1pf+0w
1.2u 0.1s 0:01 98% 0+16k 0+1io 1pf+0w




					Terry Laskodi
					     of
					Tektronix

bart@reed.UUCP (Bart Massey) (05/30/85)

> In article <583@intelca.UUCP> cem@intelca.UUCP (Chuck McManis) writes:
> >> [quoting someone else...]
> >> I just love the contact sport of "combative benchmarking".  I note how
> >> the source code for the Hofstader (sp?) benchmark just accidentally
> >> happens to declare its register variables from the least-used to the
> >> most used, the opposite of normal C convention.  And by coincidence,
> >> there are three of those little hummers... and we're comparing a
> >> 68K with >3 regvars against a 286 with only 2!
> >> This means that the single most heavily used register variable will
> >> be in a reg on the 68K and on the frame for a 286.  My my, what a
> >> terrible accident.
> 
> When I posted the benchmark I was not aware of all that.  But what's the
> complaint? Are you saying that its not fair to use registers since one
> chip only has 2 of them?   In the real world programs would use a lot more
> than two registers.  Why are you trying to hide architectural weaknesses?
> Benchmarks should be just the thing to point out such weaknesses.

	Quite aside from the rest of this argument, I believe some folks
have still missed the point.  It isn't that >3 regvars are declared in the
code -- it's the ORDERING!  K&R explicitly require compilers to allocate
regvars IN THE ORDER THEY'RE DECLARED, stopping only when they run out of
registers.  The purpose of this ordering requirement is to ENSURE THAT THE
MOST HEAVILY USED VARIABLES END UP IN REGISTERS.  If you had put that "most
heavily used register variable" ahead of all the other register variables
in the declaration, no one would have complained.  This may be an honest
mistake, but it still makes for a poor "benchmark".

	Anyway, enough on this subject already,

					Bart Massey
					..tektronix!reed!bart

I learned to program on a TRS-80 in "Level II BASIC" -- thus, all other machines
appear equally fast to me...

steiny@idsvax.UUCP (Don Steiny) (05/30/85)

**

	Saying that it is unfair to compare the 286 with other chips
because there is no huge model compiler is begging the question.
Why are there no huge model compilers for the 286 even though
many people have been working on it for years?   Simple, the
286 is a nightmare for compiler writers.   There are no
truely general purpose registers in Intel chips.  The memory
management scheme requires the load a segement discriptor
table to get the information about the segments so that 
the program can get the data from the segment.   Note that
after the program has the information from the segment 
discriptor table, it still had to compute the offset in 
the normal Intel fashion.  Though the segmented Intel
chips have traditionally been difficult to write
compilers for, the 286 is even worse.  

	I was working on some the initial design of the compiler
for the AT&T sanctioned port to the 286 as a consultant to DRI.
We were having a meeting to discuss the global portions of the
the compiler, symbol table, object format, and so on.  We were
going over the 286 to make sure we all understood it.  After a
few hours we all were incredulous.  Why?  We asked.  What would
motivate such insanity?   We figured that it must have some
properities that are especially appealing to engineers.  We 
were all software, so we did not appreciate it.

	I have been using a huge model compiler on a 16032 for
several months.   It is Tolerant System's 4.2 port to the 
16032.  It has System V shared memory in addition to the
normal 4.2 IPC.   There is no problem at all in declaring
shared memory segments of 4MB.    This is not new, National
Semiconductor Genix had huge model many years ago.
There are many huge model compilers for the 68k chips.  

	Blame it on Microsoft! :-)

	Intel is not inherently evil or anything.  They have been
trying to maintain compatability with their older chips.  I have
heard rumors that the 386 will have linear address space.  Now
if they would just give compiler writers a few extra registers
to evaluate expressions and computer memory locations . . .

guy@sun.uucp (Guy Harris) (05/30/85)

> As for "getting by", I assume you consider it a feature that your
> compiler drags along an extra 16 bits when you don't need it. When I need
> long ints, I use long ints. How do define 16 bit numbers? short? and if
> so what is a byte in your compiler?

Two points:

1) Not all 68000 C implementations have 32-bit "int"s.  You do have
to take more care to be more type-correct when writing code.  I consider
this a feature, not a bug...

2) Well, the C compiler on the 68000-based UNIX machine this is being typed
on supports 32-bit "int"s and defines 16-bit numbers as "short" and a byte
is 8 bits - it can be called "char" or "unsigned char" depending on whether
you want sign extension.  What's so hard about defining 16-bit numbers on
32-bit "int" implementations of C?  The VAX has done it for quite a few
years now.

3) 16-bit "int"s on machines with large address spaces can make it a pain to
deal with large lumps of data.  For instance, the V7/S3 "nm" loses big,
because it uses "realloc" to grow its in-core copy of the symbol table.
"realloc" takes an "unsigned int" as its argument; get more than 64KB worth
of symbol table and you lose (yes, there do exist programs which break
this).  Unfortunately, C doesn't have a way of saying "int big enough to
hold the size of the address space, in bytes" nor is there any standard system
include file defining such a type.

> Here we are discussing compilers again, Microsoft has yet to release a
> compiler that can deal with large arrays, the 286 has a 1Gbyte virtual
> address space and hence plenty of room.  I personally can write the
> "benchmark" in assembly quite easily, again the COMPILER can't hack it
> but the chip can.

Why is it so hard to write compilers which can handle that 1GB virtual
address space?  It's much easier on the 68000.  How easy is it to write a
compiler which can handle 65,537-byte arrays almost as efficiently as it
handles 65,536-byte arrays?  (For that matter, how easy is it to handle
128KB contiguous arrays of bytes, e.g. the screen that what I'm typing
is appearing on?)  The compiler may have trouble hacking it because the chip
makes it difficult to handle.

	Guy Harris

john@x.UUCP (John Woods) (05/30/85)

> > int a[50000];
> > main()
> > {
> >   int i;
> >   for (i=0; i<50000; i++) a[i+1] = a[i];
> > }
> > Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
> > Motorola Semiconductor Inc.  Austin, Texas
> > "I work with 'em and mine works"
> Hmmm, once again Dave has submitted a benchmark that requires more than 64K
> of data.  This continued harping on the issue seems to indicate to me that

At MIT, I use PDP-11s all the time.  I run out of 64K data all the time.
I don't do that out of some sadistic intention to prove that PDP-11s are
inferior in comparison to any given processor, I do that because the work
that I do happens to be that large.  I have seen people doing FORTRAN
benchmarks that take 4Mb of data.  If the 286 is too high and mighty to
act on enough data, I don't particularly care how fast it thinks it is.

You probably have a 4004 in ECL at home as your home computer...
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA

"MU" said the Sacred Chao...

rosalia@tekig4.UUCP (Mark Galassi) (05/30/85)

In article <146@tekcrl.UUCP> terryl@tekcrl.UUCP () writes:
>>> int a[50000];
>>> 
>>> main()
>>> {
>>>   int i;
>>>   for (i=0; i<50000; i++) a[i+1] = a[i];
>>> }
>>> 
>>> Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
>>> Motorola Semiconductor Inc.  Austin, Texas
>>> "I work with 'em and mine works"
>
>     OK, here are times for a 68010 system we use here at Tek running 4.2,
>with ONE wait state for the memory subsystem. This was on an unloaded system.
>Do whatever you want with the times. The compiler was the Greenhills compiler
>for the 68000.
>
>1.2u 0.1s 0:01 99% 0+16k 0+0io 1pf+0w
>1.2u 0.1s 0:01 98% 0+16k 0+1io 1pf+0w
>
>					Terry Laskodi
>					     of
>					Tektronix

We might do things correctly (and show ADA fans that C programs
can respect array boundaries) and use
   for (i=0; i<49999; i++) a[i+1] = a[i];
	       ^^^^^
				Mark Galassi
				tektronix!tekig4!rosalia

phil@amdcad.UUCP (Phil Ngai) (05/31/85)

In article <635@cadovax.UUCP> keithd@cadovax.UUCP (Keith Doyle) writes:
>we can write benchmarks that use 12 registers to make the Motorola look
>good, and ones that use 2 to make the Intel look good.)

Does this mean that if you have 16 registers and you only use 2 of them
you pay a penalty for having 14 idle registers? This is about the only
conclusion I can draw from your statement. How good is the 68K overall
if it wins in benchmarks which use lots of registers and loses in benchmarks
which don't use lots of registers?
-- 
 There's always tomorrow.

 Phil Ngai (408) 749-5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.ARPA

jnw@mcnc.UUCP (John White) (06/02/85)

     I ran the Knight's tour benchmark on a Tandy 2000 (186 at 8Mhz).
I used the DeSmet C compiler. It ran in exactly 6 seconds (everything on
ramdisk). I would be interested in seeing what other compiler/processors do.
A comparison of small/large/HUGE models whould be particularly interesting.
(DeSmet is a small model, non-optimizing compiler).
- John N. White
{duke, mcnc}!jnw

hall@ittral.UUCP (Doug Hall) (06/02/85)

>                                   If the 286 is too high and mighty to
>act on enough data, I don't particularly care how fast it thinks it is.
>
>You probably have a 4004 in ECL at home as your home computer...
>-- 

Don't laugh - from what I've seen, 4004's grow up to be 286's. ;-)

jans@mako.UUCP (Jan Steinman) (06/03/85)

In article <146@idsvax.UUCP> steiny@idsvax.UUCP (Don Steiny) writes:
>I have been using a huge model compiler on a 16032 for several months...
>There is no problem at all in declaring shared memory segments of 4MB...
>National Semiconductor Genix had huge model many years ago...
>There are many huge model compilers for the 68k chips...

Huge model?  Segments?  What are you talking about?  Please don't spread
Intelisms to REAL processors! :-)  There are NO huge model compilers for
either Nati or Moto processors, because there is no need for any distinction!

Linear, orthogonal, and proud of it!  (A satisfied NS32000 user.)
-- 
:::::: Jan Steinman		Box 1000, MS 61-161	(w)503/685-2843 ::::::
:::::: tektronix!tekecs!jans	Wilsonville, OR 97070	(h)503/657-7703 ::::::

david@daisy.UUCP (David Schachter) (06/03/85)

Actually, there are several compilers available for the 286 that handle
arrays bigger than 64K.  Even Fortran!  We use Fortran-86 (Intel's version
of Fortran-77) for some programs that declare megabyte arrays and it works
fine. 

jer@peora.UUCP (J. Eric Roskos) (06/03/85)

[The referenced article discusses the advantages of having memory management
on-chip on the 286 and 386, and emphasizes the alleged better speed of the
286 over the 68000 machines.]

Why is so much time spent in these discussions talking about vague matters of
opinion, and so little on WHY the problems exist?

A major problem with the 286 (and I guess the 386 too, though I haven't
seen it yet, only the fragmentary descriptions in this newsgroup) is in the
number of bits available in the instructions themselves for addressing
data.  This is the primary addressing problem.  The 8086 family has a
maximum of 16 bits of address for most instructions.  Even with the
segmentation registers and memory management of the 286, which have more
bits for addresses, nothing is changed, because the basic problem that
existed in the 8086 still exists: the instructions themselves do not have
any additional bits for addresses.  You can argue that implicit in the
register usage and type of the instructions are two bits for the selection
of a segmentation register (and that is stretching it, since the register
usage only provides 1 implicit address bit, while the type of instruction
selects whether you are using the CS register, or whether the register
usage will select DS or SS).  Consequently, even by the most convoluted of
thinking, your normal memory reference instructions on the 8086 and 80286
have only 18 address bits; and the size of the segmentation registers,
which is one of the major improvements in the 286, doesn't help this
problem at all.  Certainly having memory-management on-chip doesn't.


The 80286 does have some considerable instruction set improvements over the
8086; but it is the bit representation of these instructions that leads to
the problems.   [If you really want an appreciation for the complexity
of the instruction set, try designing an assembler for it!  Even when you
think you understand the instruction encoding, you may suddenly discover
that there are two instructions that don't quite fit...]
-- 
Full-Name:  J. Eric Roskos
UUCP:       ..!{decvax,ucbvax,ihnp4}!vax135!petsd!peora!jer
US Mail:    MS 795; Perkin-Elmer SDC;
	    2486 Sand Lake Road, Orlando, FL 32809-7642

	    "V'z bss gb gur Orezbbgurf, gb jngpu gur bavbaf
	     na' gur rryf!"  [Jryy, jbhyq lbh oryvrir Arj Wrefrl?]

mat@amdahl.UUCP (Mike Taylor) (06/04/85)

> Here's a little program that makes a good benchmark.  It especially
> exercises the CALL instruction, clearly on of the most used of all
> instructions.
> Darrell Long

I can't resist these.  This isn't a 286, but...
Amdahl 5860, UTS 2.2, n = 5.
System 0.009s
User   0.128s
Amdahl 5860, UTS 2.2, n = 6.
System 0.030s
User   3.654s
How about posting some of the other numbers?
-- 
Mike Taylor                        ...!{ihnp4,hplabs,amd,sun}!amdahl!mat

[ This may not reflect my opinion, let alone anyone else's.  ]

phil@amdcad.UUCP (Phil Ngai) (06/04/85)

In article <293@celerity.UUCP> ron@celerity.UUCP (Ron McDaniels) writes:
>If 64k segments aren't a problem and the "large system model" is so
>blasted good (if you like to go into interpretive mode when you
>execute), why does the 386 have a 32-bit segment length? 64k segments
>are architecturally stinko. I realize you still have to sell chips so
>that you can pay the bills, but stop making silly comparisons.

I read the enclosed quote from Ken Shoemaker and nowhere do I see that he
says 64K segments aren't a problem. What he does say is for programs
that require less than 64K of data, a 286 competes nicely with a 68020.

I enclose the quote in question below.

>In article <588@intelca.UUCP> kds@intelca.UUCP (Ken Shoemaker) writes:
>>Hmmm, once again Dave has submitted a benchmark that requires more than 64K
>>of data.  This continued harping on the issue seems to indicate to me that
>>maybe Dave realizes that for programs that require less than 64K of data
>>that a 12MHz 286 actually keeps pace with the 16.67 MHz 68020.
-- 
 There's always tomorrow.

 Phil Ngai (408) 749-5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.ARPA

keithd@cadovax.UUCP (Keith Doyle) (06/04/85)

[..........]
>In article <635@cadovax.UUCP> keithd@cadovax.UUCP (Keith Doyle) writes:
>>we can write benchmarks that use 12 registers to make the Motorola look
>>good, and ones that use 2 to make the Intel look good.)
>
>Does this mean that if you have 16 registers and you only use 2 of them
>you pay a penalty for having 14 idle registers? This is about the only
>conclusion I can draw from your statement. How good is the 68K overall
>if it wins in benchmarks which use lots of registers and loses in benchmarks
>which don't use lots of registers?
>-- 
> Phil Ngai (408) 749-5720

I'm sorry, I should have included a :-) on that statement.  I was trying to
point out that you have to be careful with benchmarks, as no matter what
you have for a processor, it's not hard to customize your benchmarks to
say whatever you want.  Personally, I would be interested in Motorola vs
Intel benchmarks if we could all up front agree on a collection of things
to evaluate and look at them as a whole.  Even then, your intended use of
a processor will affect what you think of the benchmarks as a whole.

I will throw out a starting list of potential benchmarks that one might
use for a more thorough comparison, if there is any interest, let's add to
it and see if we can come up with a reasonable set that could actually be
useful in determining which is best for certain jobs.  Here is the list:

1. Test effect of code and data size for BOTH >64k and <64k.
   In addition, it might be useful to come up with some statistics on average
   code sizes in various environments, UNIX, PC, etc. perhaps so we can
   better decide how important this might be.

2. Test number crunching capabilities (multiply/divide etc) for BOTH 16 and
   32 bit quantities, probably exluding coprocessors (let's test them
   seperately-- later). 

3. Test higher level languages support including:
     1.  C    large and small model
     2.  Modula-2 and/or Pascal
     3.  Multiple stack oriented languages such as Forth, PostScript, Neon.
  and this probably includes both 16 and 32 bit tests.

4. Test performance effects of register set size.

5. Compare capabilities and performance of block-oriented instructions.


I'm sure there are others.  And, even if we come up with a better list, no
doubt not everyone will agree that it's a reasonable set.  Still, such tests
are more useful than the old:

       Well, suck on this:    for (i=0;i<500000;i=i+1)  a[i]=0

approach.

Keith Doyle
#  {ucbvax,ihnp4,decvax}!trwrb!cadovax!keithd
"There are 4 types of mendacity, lies, damned lies, statistics, and benchmarks"

kds@intelca.UUCP (Ken Shoemaker) (06/05/85)

> It is always possible to find special cases where inferior processors
> outperform superior ones, as witness the recent brouhaha about the Z80
> outrunning the VAX 780 on function calls.  One of the tricky parts of
> benchmarking is deciding what constitutes a special case and what doesn't.
> 
> Fifteen years ago, >64KB of data would frequently have been classed as
> a special case.  Today, perceptions have changed, and it is not out of

I didn't say that >64K was a special case, rather, I said that it
certainly is not the ONLY case, and that working ONLY this issue (as he
does), Dave seems to imply that that is the ONLY case where the 68020
can consistently beat the 286 in benchmark performance, even when the
286 is running at a slower clock speed!   The benchmarks 
that we published, although they did require <64K of data, were NOT written
by Intel, and implying that they are all special cases isn't quite
fair, either, don't you think...or don't you?
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

steiny@idsvax.UUCP (Don Steiny) (06/06/85)

> In article <146@idsvax.UUCP> steiny@idsvax.UUCP (Don Steiny) writes:
> >I have been using a huge model compiler on a 16032 for several months...
> >There is no problem at all in declaring shared memory segments of 4MB...
> >National Semiconductor Genix had huge model many years ago...
> >There are many huge model compilers for the 68k chips...
> 
> Huge model?  Segments?  What are you talking about?  Please don't spread
> Intelisms to REAL processors! :-)  There are NO huge model compilers for
> either Nati or Moto processors, because there is no need for any distinction!
> 
	The "segments" are a System V feature that are a form of
interprocess communication and have no relation to Intel segments.
They are not too easy to implement on Intel chips.

	Of course huge/large/small/medium  and so on are Intelisms.
Sorry about that.  I was thinking that the implementation
used virtual memory.   The "four megabyte shared memory
segements" would often be bigger than memory.  

kds@intelca.UUCP (Ken Shoemaker) (06/10/85)

> A major problem with the 286 (and I guess the 386 too, though I haven't
> seen it yet, only the fragmentary descriptions in this newsgroup) is in the
> number of bits available in the instructions themselves for addressing
> data.  This is the primary addressing problem.  The 8086 family has a
> maximum of 16 bits of address for most instructions.  Even with the

This is not true with the 386, you get 32-bit offsets into segments.

> which is one of the major improvements in the 286, doesn't help this
> problem at all.  Certainly having memory-management on-chip doesn't.

I didn't say that it did; rather, I said that NOT having memory protection
in a multi-user system is the kiss of death, and tacking it onto a 68k
both slows down the system, and adds significantly to the cost of the
processor subsystem.  These costs are not present in a system designed
around a 286.
-- 
It looks so easy, but looks sometimes deceive...

Ken Shoemaker, 386 Design Team, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,qantel}!intelca!kds
	
---the above views are personal.  They may not represent those of Intel.

mike@peregrine.UUCP (Mike Wexler) (06/11/85)

Can someone tell me what the advantages of a segmented architechture is over
an equally efficient architechture based on "traditional" memory management.
Are these advantages worth the cost in both chip space and program complexity?
-- 
--------------------------------------------------------------------------------
Mike Wexler(trwrb!pertec!peregrine!mike) | Send all flames to:
15530 Rockfield, Building C              |	trwrb!pertec!peregrine!nobody
Irvine, Ca 92718                         | They will then be given the 
(714)855-3923                            | consideration they are due.

ee171ael@sdcc3.UUCP (GEOFFREY KIM) (06/12/85)

In article <3146@dartvax.UUCP>, chuck@dartvax.UUCP (Chuck Simmons) writes:
> > int a[50000];
> > 
> > main()
> > {
> >   int i;
> >   for (i=0; i<50000; i++) a[i+1] = a[i];
> > }
> > 
> > Dave Trissel    {seismo,ihnp4}!ut-sally!oakhill!davet
> 
> I know I can't program in C, but doesn't this program have a small
> bug in it?  When i is 49999, doesn't some random word of memory
> get trashed by the assignment statement?

Dear Chuck,,
Please post this article to net.jokes.

Larry G. Kim

bc@cyb-eng.UUCP (Bill Crews) (06/15/85)

> Can someone tell me what the advantages of a segmented architechture is over
> an equally efficient architechture based on "traditional" memory management.
> Are these advantages worth the cost in both chip space and program complexity?

Hopefully, no one will.  This discussion has been agonizing.

-- 

  /  \    Bill Crews
 ( bc )   Cyb Systems, Inc
  \__/    Austin, Texas

[ gatech | ihnp4 | nbires | seismo | ucb-vax ] ! ut-sally ! cyb-eng ! bc

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (06/18/85)

> Dear Chuck,,
> Please post this article to net.jokes.
> 
> Larry G. Kim

Chuck was correct; the benchmark had the bug that he pointed out.
Why is that a joke?

johnsson@chalmers.UUCP (Thomas Johnsson) (07/29/85)

> The bottom line is which machine offers the best SOLUTION. The processor
> inside should not matter to the end-user. After all, usability is usually
> more dependent on software than on hardware.

I absolutely agree. But it makes me wonder the application programs
that AREN'T developed because of a messy or impossible architecture.





-- 
**** REPLACE THIS LINE WITH BUG DROPPINGS ****
____________________________________________________________
Thomas Johnsson
  UUCP:  johnsson@chalmers.uucp  or ..decvax!mcvax!enea!chalmers!johnsson
  CSNET: johnsson@chalmers.csnet
  Mail on dead trees: Dept. of CS, Chalmers University of Technology, 
                      S-412 96 Goteborg, Sweden
  phone: dept: +46 (0)31 810100, home sweet home: +46 (0)31 252724    
  UFOs, please land at:  57.43 N,  11.59 E (the green hilly lawn)

mike@peregrine.UUCP (Mike Wexler) (08/01/85)

> 
> > The bottom line is which machine offers the best SOLUTION. The processor
> > inside should not matter to the end-user. After all, usability is usually
> > more dependent on software than on hardware.
> 
> I absolutely agree. But it makes me wonder the application programs
> that AREN'T developed because of a messy or impossible architecture.
>
I can name one.  Peregrine Four.  It was developed by Peregrine
Systems and has been delayed by at least a year(our latest schedule
says six months from now).

Mike Wexler
15530 Rockfield, Building C
Irvine, Ca 92718
(714)855-3923
(trwrb|pesnta|scgvaxd)!pertec!peregrine!mike
-- 
Mike Wexler
15530 Rockfield, Building C
Irvine, Ca 92718
(714)855-3923
(trwrb|pesnta|scgvaxd)!pertec!peregrine!mike