[comp.sys.intel] Recent Motorola ad seen in Byte

lodman@ncr-sd.UUCP (03/27/87)

In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
>What did you, fellow Usenetters, think of the recent Motorola advertisement
>(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
>copious amounts of smoke when comparing their processors vs those from
>Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
>is on the other foot and Motorola seems to be running the smear campaign.

I have seen some ads from Motorola (the apples and oranges ad)
and I thought what they said was fairly interesting. We had heard
that they cheated on their dhrystone etc. benchmarking long before
Motorola claimed this was so. I haven't seen the ad in question yet.

If Intel at last has a serious machine, it would seem that they have
a credibility gap to close, which was opened after years of 8088,8086
etc. garbage that they sold and people bought. I will believe it
when I have a '386 in my hot little hand. Until then, I'm a little
skeptical.

Personally, I find Motorola's claims much more believable.


-- 
Michael Lodman
Advanced Development NCR Corporation E&M San Diego
mike.lodman@SanDiego.NCR.COM 
{sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!lodman

ed@plx.UUCP (04/01/87)

In article <1466@ncr-sd.SanDiego.NCR.COM>, lodman@ncr-sd.SanDiego.NCR.COM (Mike Lodman) writes:
> In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
> >What did you, fellow Usenetters, think of the recent Motorola advertisement
> >(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
> >copious amounts of smoke when comparing their processors vs those from
> >Motorola.  It seems now that Intel has a somewhat serious machine, the shoe 
> >is on the other foot and Motorola seems to be running the smear campaign.
> 
> 
> Personally, I find Motorola's claims much more believable.
> 
> 
 Me too, I'm told that if you want to take advantage of the 386 performance,
 you have to use some DISGUSTINGLY EXPENSIVE  RAM. 

 The other thing to remember is that the Motorola parts are SHIPPING at
 25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.

 To me, the only advantage of '386 is the virtual DOS machine capability.
 the ability to run DOS as a task under UNIX  seems neat.

 Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
 that you can really cut down on all those support chips (building an
 8k cache out of discrete components is EXPENSIVE.  

 -ed-
> -- 
> Michael Lodman
> Advanced Development NCR Corporation E&M San Diego
> mike.lodman@SanDiego.NCR.COM 
> {sdcsvax,cbatt,dcdwest,nosc.ARPA,ihnp4}!ncr-sd!lodman

mash@mips.UUCP (04/02/87)

In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
...discussion of 68K versus Intel merits....
> Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
> that you can really cut down on all those support chips (building an
> 8k cache out of discrete components is EXPENSIVE.  

Sigh.  If you can support that statement with live benchmarks of
substantial, real programs, please post them.  Even synthetic benchmarks
(beyond those in the Intergraph 12/86 report, I have that)
of any size would be useful, since it's VERY hard to find substantive
numbers that really support the Clipper performance claims (5 Mips,
average performance 5X an 11/780) on anything but Dhrystone and "toy"
benchmarks.

8K cache: expensive? we spend about $150 for 24K of cache.  Maybe that's
more expensive than a pair of 300K+ transistor Clipper CAMMUs, but I doubt it.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

davidsen@steinmetz.UUCP (04/02/87)

In article <580@plx.UUCP> ed@plx.UUCP (Ed Chaban) writes:
....
>> 
> Me too, I'm told that if you want to take advantage of the 386 performance,
> you have to use some DISGUSTINGLY EXPENSIVE  RAM. 

I have the PC Designs GV386 (16Mhz at 0ws). It uses a very small chunk
of static RAM (35ns) as cache, and the rest is relatively checp memory.
I put a 2-1/2 MB ramcard in using 120ns chips, $110 for the board,
$202.50 for the memory. That's $125/MB, rather reasonable. Running on
the 16 bit bus drops the speed by about 20%, when 1Mbit chips drop a
bit I'll put more on the motherboard. Still it runs about 3x a VAX,
which is acceptable for a PC.

The bottom line is that the base price with hard disk and display is
~$4k. You just can't get a 68020 box for that price (the new Apple
looks more like $7k for the same size box, and that's a slow 68020.
Intel is starting to ship the cache controller, so the price of cache
will come down by 4th quarter (I assume).
>
> The other thing to remember is that the Motorola parts are SHIPPING at
> 25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.

So does a Cray2... compare on equal price or clock speed, please. The
actual performance is in a ratio of about 25:16 for Sun 3/260 and GV386.

-- 
bill davidsen			sixhub \
      ihnp4!seismo!rochester!steinmetz ->  crdos1!davidsen
				chinet /
ARPA: davidsen%crdos1.uucp@ge-crd.ARPA (or davidsen@ge-crd.ARPA)

dad@cipric.UUCP (04/03/87)

Thought some of you out there might be interested in this...I ran the
drystone benchmarks on an Intel 80386 Multibus II system (which by the
way is running AT&T S5R3), and got:
	5882 dryr
	5376 drynr
I had to change the standard 50000 passes to 500000 to get accurate
times.  With numbers like those, I could probably be converted to Intel
pretty quickly.  But I don't think I'll ever get over that funky addressing
technique.	-Dan
P.S. - This is about 1.8 times our Sun-3/160.

tim@ism780c.UUCP (04/04/87)

> To me, the only advantage of '386 is the virtual DOS machine capability.
> the ability to run DOS as a task under UNIX  seems neat.

The neat thing about the '386 is that it can run UNIX, unlike the 68020.
With the 68020 you have to put an MMU in your system.  However, hardware
guys at companies that make affordable computers seem to have this thing
against putting MMUs in.  With the '386, no silly hardware person can
leave out the MMU.

Of course, the hardware guy could divide the physical address space into
disjoint pages, each less than 4k bytes, if he really wanted to stop
me from running UNIX...
-- 
Tim Smith			"And if you want to be me, be me
uucp: sdcrdcf!ism780c!tim	 And if you want to be you, be you
Compuserve: 72257,3706		'Cause there's a million things to do
Delphi or GEnie: mnementh	 You know that there are"

brayton@yale.UUCP (04/04/87)

In article <21@cipric.UUCP> dad@cipric.UUCP (Dan A. Dickey) writes:
>Thought some of you out there might be interested in this...I ran the
>drystone benchmarks on an Intel 80386 Multibus II system (which by the
>way is running AT&T S5R3), and got:
>	5882 dryr
>	5376 drynr

I'm assuming that this is with a 16 MHz clock.  With a 20 Mhz clock these
numbers would go up by 25% giving even more impressive results.

			Jim Brayton
-----------------------------------------------------------------------------
brayton@yale.UUCP					brayton@yale.ARPA	
-----------------------------------------------------------------------------

clif@intelca.UUCP (04/07/87)

> In article <21@cipric.UUCP> dad@cipric.UUCP (Dan A. Dickey) writes:
> >Thought some of you out there might be interested in this...I ran the
> >drystone benchmarks on an Intel 80386 Multibus II system (which by the
> >way is running AT&T S5R3), and got:
> >	5882 dryr
> >	5376 drynr
> 
> I'm assuming that this is with a 16 MHz clock.  With a 20 Mhz clock these
> numbers would go up by 25% giving even more impressive results.
> 
> 			Jim Brayton
> -----------------------------------------------------------------------------
> brayton@yale.UUCP					brayton@yale.ARPA	
> -----------------------------------------------------------------------------

Those numbers are also using PCC compiler, a good optimizing Compiler
like Green Hill's C 1.8.2G or Metaware C-386 1.3 increase these times
by additional 15-20%.  


-- 
Clif Purkiser, Intel, Santa Clara, Ca.
{pur-ee,hplabs,amd,scgvaxd,dual,idi,omsvax}!intelca!clif

These views are my own property.  However anyone who wants them can have 
them for a nominal fee.
	

tomk@intsc.UUCP (04/10/87)

> In article <580@plx.UUCP> (Ed Chaban) writes:
> In article <1466@ncr-sd.SanDiego.NCR.COM>, lodman@ncr-sd.SanDiego.NCR.COM (Mike Lodman) writes:
>> In article <362@sbcs.UUCP> root@sbcs.UUCP (Root) writes:
>>>What did you, fellow Usenetters, think of the recent Motorola advertisement
>>>(April '87 Byte) about 68020 -vs- 80386?  Used to be that Intel dispensed
>> 
>> Personally, I find Motorola's claims much more believable.
>> 
>  Me too, I'm told that if you want to take advantage of the 386 performance,
>  you have to use some DISGUSTINGLY EXPENSIVE  RAM. 
> 
Not TRUE!!!  To run a near 0ws 16MHz machine you can do a small system 
(<8MB) with 100ns DRAMs.  A larger system requires 80ns.  If you want
to put a cache controller in the design it uses 45 ns statics for the data
and 25 to 35 ns statics for the tags depending on the control circuit.  With
the forthcoming cache controller chip you can use 35ns SRAMS with  20MHz CPU.

>  The other thing to remember is that the Motorola parts are SHIPPING at
>  25Mhz.  A 25Mhz '020 blows the doors off a 16.? Mhz '386.
> 
Show me a benchmark that does not fit in 256 bytes thats even keeps up
with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
25MHz soon.

>  To me, the only advantage of '386 is the virtual DOS machine capability.
>  the ability to run DOS as a task under UNIX  seems neat.
> 
That plus speed too.  

>  Now the REAL screamer is CLIPPER. The nice thing about CLIPPER is
>  that you can really cut down on all those support chips (building an
>  8k cache out of discrete components is EXPENSIVE.  
> 
I will let the intergraph (the only known implementation) benchmark 
numbers speak for themselves.

I can see that a lot of people are going to start comparing the Compaq
386 machine against the Sun 3/260.  Just remember that the cost of a box
has a lot to do with how much performance optimization goes into the design.
The street price for a 386AT clone will be around $3500 by summer.  That is
compared to an $8000-$50000 Sun machine.  Price performace is still a rule
that we are stuck with.  Or in other words,  the CPU doesn't mean nearly as
much as the memory and I/O subsystems when you talk about performance.  I 
will be the first one to admit that the IBM PC and its derivatives are a 
kludge.  But don't blame the 386 for IBM's incompetence.

------
"Ever notice how your mental image of someone you've 
known only by phone turns out to be wrong?  
And on a computer net you don't even have a voice..."

  tomk@intsc.UUCP  			Tom Kohrs
					Regional Architecture Specialist
		   			Intel - Santa Clara

P.S.  If anyone wants to see a real 386 machine in action call your 
local sales office for a demo and get your benchmarks ready.

caf@omen.UUCP (04/11/87)

In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

:Show me a benchmark that does not fit in 256 bytes thats even keeps up
:with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
:25MHz soon.

Well, here's one that takes 8k, somewhat larger than 256 bytes.  A 25 mHz
68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.)

The Computer Dynamics 386 uses an Intel 386 motherboard goosed to 18 mHz,
apparently without ill effect.  The system uses all 32 bit Intel ram (2.5
MB total including 32 bit memory expansion board).  Note that the IBM top
of the line PC2 20 mHz 386 machine is specified at one wait state, same as
my box.

I have been told that the best 68k C compilers usually beat the code density
of 8086/286 C compilers but have not verified this for myself.

Before Intel flames these numbers, I suggest they provide me a hotter
386 chip and/or 386 Unix, and I shall post updated numbers once I have
made sure they represent real systems.  My address is below.

I should also like to run this benchmark on the 386 in 286 pinouts chip
PC-WEEK announced a few weeks ago. I have two PC-AT machines ready
to go.


Compile - Link		Execute	Code
Real	User	Real	User	Bytes	System

7.4	.8	.34	.3416	124	Definicom SYS 68020 25mHz SiVlly 11/86
11.8	2.8	.56	.56	131	CompDyn (Intel MB) + 386 Toolkit 12/86

	Sieve benchmark (Slightly modified from Byte Magazine version)
		12-07-86 Chuck Forsberg Omen Technology Inc


NOTE: If the resulting time is too short to measure with a precision of
a couple of percentage points, increase the number of outer loops (n) to
1000 or (if running on vaporware microcomputers) to 1000, and scale the
result accordingly. 

siev.c:
#define S 8190
char f[S+1];
main()
{
/*	register long i,p,k,c,n;	For 32 bit entries for PC */
	register int i,p,k,c,n;
	for (n = 1; n <= 10; n++) {
		c = 0;
		for (i = 0; i <= S; i++) f[i] = 1;
		for (i = 0; i <= S; i++) {
			if (f[i]) {
				p = i + i + 3; k = i + p;
				while (k <= S) { f[k] = 0; k += p; }
				c++;
			}
		}
	}
	printf("\n%d primes.\n", c);
}

Chuck Forsberg WA7KGX Author of Pro-YAM communications Tools for PCDOS and Unix
...!tektronix!reed!omen!caf  Omen Technology Inc "The High Reliability Software"
  17505-V Northwest Sauvie Island Road Portland OR 97231  Voice: 503-621-3406
TeleGodzilla BBS: 621-3746 2400/1200  CIS:70007,2304  Genie:CAF  Source:TCE022
  omen Any ACU 1200 1-503-621-3746 se:--se: link ord: Giznoid in:--in: uucp
  omen!/usr/spool/uucppublic/FILES lists all uucp-able files, updated hourly

galen@oucs.UUCP (04/12/87)

In article <930@intsc.UUCP>, tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> I can see that a lot of people are going to start comparing the Compaq
> 386 machine against the Sun 3/260.  Just remember that the cost of a box
> has a lot to do with how much performance optimization goes into the design.
> The street price for a 386AT clone will be around $3500 by summer.  That is
> compared to an $8000-$50000 Sun machine.  Price performace is still a rule
> that we are stuck with.  Or in other words,  the CPU doesn't mean nearly as
> much as the memory and I/O subsystems when you talk about performance.  I 
> will be the first one to admit that the IBM PC and its derivatives are a 
> kludge.  But don't blame the 386 for IBM's incompetence.

Realistically, could a 386AT clone do everything that the SUN does.  (Very 
high-res graphics ... 8| ) for the same price and give the performance?  

It also seems to me that IBM ** OWNS ** the majority of the Intel stock???

I personally perfer the 68xxx processors to any Intel processor.  The 
instruction set is better in my opinion, and the registers are more
versatile.

(Flame off)
BTW, why don't we set up a comp.sys.religious-wars or something and get this
out of here???
(Flame others and self!!! 8) 8) ... )

#include <standard-disclamer.h>
-- 
----S----N----A----R----K----S----&----B----O----O----J----U----M----S----
Douglas Wade Needham     (614)593-1567 (work) or (614)597-5969 (Home)
Electrical Engineering Dept., Ohio University, Athens, Ohio 45701 
UUCP: ...!cbatt!oucs!galen ** Smart Mailers: galen@pdp.cs.OHIOU.EDU

brewster@watdcsu.UUCP (04/12/87)

In article <532@pdp.cs.OHIOU.EDU>, galen@pdp.cs.OHIOU.EDU (Douglas Wade Needham) writes:
> 
> It also seems to me that IBM ** OWNS ** the majority of the Intel stock???
> 

	I think the figure is closer to  20%, and with AT&T break-up I
	doubt that this figure will rise.   But even if IBM did own Intel, the
	implication that Intel processors (specifically 8088) were built to
	IBM specs (ie for intended use in PC's) seems ludicrous.

#include <standard-disclamer.h>
#include <standard-trademarknotice.h>
                                                   
						   Try not  to become  a  man
UUCP  : {decvax|ihnp4}!watmath!watdcsu!brewster    of success but rather  try
Else  : Dave Brewer, (519) 886-6657                to  become a  man of value.
                                                         Albert Einstein

djl@mips.UUCP (04/13/87)

In article <513@omen.UUCP>, caf@omen.UUCP (Chuck Forsberg WA7KGX) writes:
> In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
> 
> :Show me a benchmark that does not fit in 256 bytes thats even keeps up
> :with at 16MHz 386.  386's are now shipping at 20MHz for the speed freaks.
> :25MHz soon.
> 
> Well, here's one that takes 8k, somewhat larger than 256 bytes.  A 25 mHz
> 68020 board more than keeps up with a 18 mHz 386 box (let alone 16 mHz.)

Goes on to include a version of seive, a classic small integer benchmark.
Please note, on the MIPS R2000, the entire main() procedure of this
benchmark (not just the inner loop, which is what counts), is 53
instructions, or 224 bytes.  Most of the total text size comes from printf().

It is safe to assume that on a CISC machine, the inner loop fits quite
nicely in a 256 byte I cache. this is exactly what you must avoid if trying
to get a real handle on performance under actual conditions.

If what you care about is performance of a processor on very small
integer compute loops, then use sieve and its ilk.  If what you care
about is performance under actual application conditions, you must
use benchmarks that more accurately reproduce those types of environments.

-- 
			***dan

decwrl!mips!djl                  mips!djl@decwrl.dec.com

galen@oucs.UUCP (04/14/87)

In my response to article <930@intsc.UUCP> (Tom Kohrs @fae) I wrote
(<532@pdp.cs.OHIOU.EDU>)
> 
> It also seems to me that IBM ** OWNS ** the majority of the Intel stock???

My apologies to Steve McReady (sorry if I spelled it wrong).  My meaning
was that I heard a RUMOR somewhere, somewhen several years ago, to this
effect.  The figures I heard were aprox. 50-60 percent.  Steve McReady
says is is aprox. 7 percent now.  Anyways, my main point was...

	WHAT DOES THIS HAVE TO DO (realistically) WITH THE M68K???

If we are going to have articles like this on the net, let's put them
someplace else.  (Also... Please note the FLAME SELF at the end of the
article!!!)  Maybe (heaven forbid! we have enough to keep track of and 
store!!!) we should create a "comp.sys.religious_wars" or something
(maybe place them under comp.sys.misc????).

Again, I apologize (also for my spelling...)

- (a 68xxx biased, veryily)  douglas wade needham

-- 
----S----N----A----R----K----S----&----B----O----O----J----U----M----S----
Douglas Wade Needham     (614)593-1567 (work) or (614)597-5969 (Home)
Electrical Engineering Dept., Ohio University, Athens, Ohio 45701 
UUCP: ...!cbatt!oucs!galen ** Smart Mailers: galen@pdp.cs.OHIOU.EDU

caf@omen.UUCP (04/15/87)

In article <285@winchester.mips.UUCP> djl@mips.UUCP (Dan Levin) writes:
:If what you care about is performance of a processor on very small
:integer compute loops, then use sieve and its ilk.  If what you care
:about is performance under actual application conditions, you must
:use benchmarks that more accurately reproduce those types of environments.

I thought many programs spend time looping in fairly localized loops,
especially on machines that lack high powered string instructions that
are useful to C.  What are "for" and "while" statements for?  Note that
you can have a 50 line for loop that still gets good cache hits if most
of the code is usually not executed.

So let's ask: how does the performance improvement provided by a small
chache on sieve compare with the performance improvement with ditroff
and tpscript for example?

While it is conceivable that Motorola put the 256 byte cache in the 68020
just to help certain benchmarks,  it is more likely that the cache
actually improves performance rather inexpensively.

Does somebody have real data on how the 68020's cache improves performance
on sieve, troff, sort, and other Unix CPU hogs?

rod@cpocd2.UUCP (04/16/87)

In article <532@pdp.cs.OHIOU.EDU> galen@pdp.cs.OHIOU.EDU (Douglas Wade Needham) writes:
>It also seems to me that IBM ** OWNS ** the majority of the Intel stock???
>

IBM does not own a majority of Intel stock.  The last I heard it's no more
than 20% - I think it's 15%.  Obviously, IBM is a major customer, but by
no means are business decisions dictated by IBM - Andy G. still has
the final word.
-- 

	Rod Rebello
	...!intelca!mipos3!cpocd2!rod

tomk@intsc.UUCP (Tom Kohrs @fae) (04/17/87)

> In article <1517@ncr-sd.SanDiego.NCR.COM>  Michael Lodman (mike.lodman@SanDiego.NCR.COM) writes:
> In article <930@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

> >I will be the first one to admit that the IBM PC and its derivatives are a 
> >kludge.  But don't blame the 386 for IBM's incompetence.
> 
> Will you also admit that the 8086 and its derivatives are a kludge?
> And I do blame Intel.
> 
Not at all!  The 8086 architecture was a real nice way of extending the 
address range back in the days when the dominate machine was the 8085 and
Z-80.  Swapping segment registers to change address was so much nicer than
doing bank selections or overlays.  The 286 architecture is considerably
nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
1980 the idea that you could get the cpu power of an 11/70 on a chip with
a cleaner memory management model got a lot of people excited.  

What caused most of the frustration toward the 286 was DEC and Motorola 
both went to a 32bit programming model at that time. Programmers quickly 
jumped to arms to adhere to the old maxim of using all of the available 
memory plus one byte. When these neat new programs (ie BSD 4.x) were 
forced back down to the 16 bit architecture things got tricky.  Many 
programmers decided that programming in a 32 bit environment required
less effort and less need for structure than the 16 bit environment and
so to justify their not liking to work on 16 bit machines they were labled
as being kludges or obsolete.

The only thing I could fault Intel for is possibly not going to a 32 bit
architecture sooner, but we were too busy building 80186's (5-6 million
sold so far).  Also we were learning about how to build an MMU (the hard
part) without having to debug 32 bit ALU's at the same time.  The design 
decision that was made 9+ years ago was do we build a slow 32bit machine
or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
the slow 32 bit.  The rest is history.

> I don't mean to imply that Intel has never had good microprocessors. The
> 8080A was a fine machine. But from that point on, Intel seemed to lose
> its edge, first to Zilog and the Z80, and then to Motorola and the 68000.
>
> I hope that the '386 is a step in the right direction for Intel...

I hope so too, now that I am a stock holder.

don@gitpyr.gatech.EDU (Don Deal) (04/17/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
>What caused most of the frustration toward the 286 was DEC and Motorola 
>both went to a 32bit programming model at that time. Programmers quickly 
>jumped to arms to adhere to the old maxim of using all of the available 
>memory plus one byte. When these neat new programs (ie BSD 4.x) were 
>forced back down to the 16 bit architecture things got tricky.  Many 
>programmers decided that programming in a 32 bit environment required
>less effort and less need for structure than the 16 bit environment and
>so to justify their not liking to work on 16 bit machines they were labled
>as being kludges or obsolete.

  Oh I get it.  It was lazy programmer's who resulted in negative comments
being made about the 286.  Thanks for clearing that up.  Next time I reload
a segment register to address more than 64k, I'll remember that I'm doing
structured programming.  Gee, come to think of it, all of the 8-bit 
microprocessors I used could only address 64k. Maybe i've been a structured
programmer all along!

>The only thing I could fault Intel for is possibly not going to a 32 bit
>architecture sooner, but we were too busy building 80186's (5-6 million
>sold so far).  Also we were learning about how to build an MMU (the hard
>part) without having to debug 32 bit ALU's at the same time.  The design 
>decision that was made 9+ years ago was do we build a slow 32bit machine
>or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
>the slow 32 bit.  The rest is history.

  Oh, use your imagination.  If you don't have one, here are a few of my
favorite gripes:

   - using single-byte opcodes for infrequently used instructions 

     AAA, AAS, DAA, DAS, LAHF, SAHF - 1 byte

   - using longer code sequences for commonly used operations 

     PUSH'ing more than two registers.  Having a register mask for
     push and pop would have been nice.  Using an eight bit mask would
     have allowed the most commonly-used registers to be pushed in
     one instruction - a sixteen-bit mask would take care of all of
     them.

     Not being able to move immediate data into a segment register.  How
     often have you seen:

       MOV  AX,data
       MOV  ES,AX

     This makes >64k addressing just *so* much more painful.

   - Selling silicon with bad bugs in it.  The 286 had problems with the POPF
     instruction and with one or more instructions in protected mode. So
     many broken parts were sold, that it was essentially impossible for
     software developers to make use of these instructions (assuming they
     knew about the problems).  Hearing word of a recall program for 386
     makes me wonder how much of an improvement the 386 really is.

   - 8080 compatibility.  In this day and age, it's no longer an advantage;
     it's a liability.
-- 
D.L. Deal, Office of Computing Services, Georgia Tech, Atlanta GA, 30332-0275
Phone: (404) 894-4660   ARPA: don@pyr.ocs.gatech.edu  BITNET: cc100dd@gitvm1
uucp: ...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!don

doug@edge.UUCP (Doug Pardee) (04/20/87)

> The design 
> decision that was made 9+ years ago was do we build a slow 32bit machine
> or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
> the slow 32 bit.

I seem to recall Intel did design a slow (*very* slow) 32-bitter, the
iAPX432.  I would suggest that it was not an Intel decision to concentrate
on 16-bit CPUs, but rather the scandalous failure of the '432 that led to
Intel being thought of as a "manufacturer of 16-bit CPUs".

-- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona

geoff@desint.UUCP (Geoff Kuenning) (04/21/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:

> Not at all!  The 8086 architecture was a real nice way of extending the 
> address range back in the days when the dominate machine was the 8085 and
> Z-80.  Swapping segment registers to change address was so much nicer than
> doing bank selections or overlays.  The 286 architecture is considerably
> nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
> 1980 the idea that you could get the cpu power of an 11/70 on a chip with
> a cleaner memory management model got a lot of people excited.  

What a blatant misrepresentation:

    (1) The PDP-11 is a 1960's architecture.  The 8086 was designed in the
	late 70's, *after* the publication of Gordon Bell's book where he
	states, "the most embarassing thing about the PDP-11 was that, only
	two years after its introduction, we found it necessary to widen
	the processor address."  But did Intel learn from DEC's mistake?
    (2) The *only* place where the PDP-11 loses to the 8086 is in the 8086's
	clever use of instruction prefixes to provide bank switching (that's
	all segments are).  The PDP-11 doesn't have any registers with
	wierd private characteristics.  By contrast, even on the 80386, if
	you want to shift by a variable shift count, it *has* to be in a
	particular register.  How flexible.
    (3) Although it is true that the 808x series was the dominant
	MICROPROCESSOR CHIP (not the dominant machine, by any stretch of
	the imagination), it is also true that there were *lots* of better
	architecture examples around.  The 6800 literally leaps to mind.
    (4) Tom also conveniently ignores what a loser the 8085 was.  Essentially
	no changes from the 8080.  Check out the 6809 by contrast, or compare
	the PDP-11/70 with the original 11, the 11/20.
    (5) The basic reason for the 8086's kludgey architecture was not because
	it was a better design than the 11/70.  It was because Intel saw
	binary compatibility with the 8080/8085 as a critical goal.  That's
	forgivable;  nobody at the time could have seen that MS/DOS was
	going to kill CP/M-86.  But contrast DEC's approach with the VAX.
	They got compatibility with no kludges, and their customers don't
	seem too unhappy.

> What caused most of the frustration toward the 286 was DEC and Motorola 
> both went to a 32bit programming model at that time. Programmers quickly 
> jumped to arms to adhere to the old maxim of using all of the available 
> memory plus one byte. When these neat new programs (ie BSD 4.x) were 
> forced back down to the 16 bit architecture things got tricky.  Many 
> programmers decided that programming in a 32 bit environment required
> less effort and less need for structure than the 16 bit environment and
> so to justify their not liking to work on 16 bit machines they were labled
> as being kludges or obsolete.

Again a blatant misrepresentation, if not an out-and-out lie.  Having just
finished mentioning the difficulty of doing overlays on the PDP-11, Tom
now tries to sell us on the idea that it's lazy programming that creates
such big code.  And he conveniently ignores the need of many programs
for large data spaces.  I did a PDP-11 application once that called for
"arbitrarily complex" databases (within the limitations of disk space)
in the spec.  We did it, but the 64K restriction was a bloody *pain*.
I guess it's just "lazy programming" that kept me from doing arbitrary
complexity within 64K...

> The only thing I could fault Intel for is possibly not going to a 32 bit
> architecture sooner, but we were too busy building 80186's (5-6 million
> sold so far).  Also we were learning about how to build an MMU (the hard
> part) without having to debug 32 bit ALU's at the same time.

Funny, I don't recall Intel introducing an MMU before the 80386 (certainly
the garbage in the 286 doesn't qualify as an MMU, not given what competitors
were doing).  Guess the learning curve was pretty steep.  And 32-bit ALU's
are not much of an excuse;  there was something of a surfeit of examples,
and in any case widening an ALU is not what most people consider a difficult
problem.  (On the other hand, to be fair, the '386 MMU is a big winner.)
As to number of parts sold, the IBM 650 was pretty successful in its day,
too, but I wouldn't defend it as a modern architecture.

> The design 
> decision that was made 9+ years ago was do we build a slow 32bit machine
> or a fast 16 bitter. Intel decided on the fast 16bit, Motorola went for
> the slow 32 bit.  The rest is history.

One of Intel's favorite misrepresentations.  It handily ignores the
fact that the 68k is just as fast in "small model" and that it is *much*
faster in "large model" or even in any application where you have to store
numbers bigger that 64K (accounting?  what's that?).  And it also completely
ignores the 6809, which was pretty successful in its own right.

Back to used cars, Tom.  At least then your customers won't see through you.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

tim@ism780c.UUCP (Tim Smith) (04/21/87)

In article <932@intsc.UUCP> tomk@intsc.UUCP (Tom Kohrs @fae) writes:
<
< doing bank selections or overlays.  The 286 architecture is considerably
< nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
< 1980 the idea that you could get the cpu power of an 11/70 on a chip with
< a cleaner memory management model got a lot of people excited.  

So how come you guys put the index in the upper 13 bits of a selector and
the GDT/LDT bit and the priveledge bits in the low 3 bits?  If you had put
the index in the low 13 bits, then the LDT/GDT bit, and then the rest,
one could have gotten a 29 bit linear address space by setting up
non-overlapping 64k segments.

This would have eliminated the major complaint that people who program
in higher level languages have against the 286.

The only reason I can think of is that it saves having to shift the
index to get the offset into the GDT/LDT, but my friends who design
CPUs and do VLSI, and all that rot, tell me that this is no problem.
-- 
Tim Smith			"Hojotoho! Hojotoho!
uucp: sdcrdcf!ism780c!tim	 Heiaha! Heiaha!
Delph or GEnie: Mnementh	 Hojotoho! Heiaha!"
Compuserve: 72257,3706

kds@mipos3.UUCP (04/22/87)

In article <3441@gitpyr.gatech.EDU> nobody writes:
>   - 8080 compatibility.  In this day and age, it's no longer an advantage;
>     it's a liability.
>-- 
>D.L. Deal, Office of Computing Services, Georgia Tech, Atlanta GA, 30332-0275
>Phone: (404) 894-4660   ARPA: don@pyr.ocs.gatech.edu  BITNET: cc100dd@gitvm1
>uucp: ...!{akgua,allegra,amd,hplabs,ihnp4,masscomp,ut-ngp}!gatech!gitpyr!don

of course, if Mot stopped making all of those 680?0 processors compatible with
the 6800, I guess that you might have an argument.  What is that I hear you
say?  The 68000 isn't compatible with the 6800?  Gee, I guess Intel will
just have to call back all the 80?86s that run 8080 code, which is exactly
0 of them.  Blah, blah, blah...Now NEC seems to have picked up the ball by
making their V20 and V30 run 8080 code in addition to the 8086 code, if
that is what you really want...
-- 
The above views are personal.

...and they whisper and they chatter, but it really doesn't matter.

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

phil@amdcad.AMD.COM (Phil Ngai) (04/22/87)

In article <652@desint.UUCP> geoff@desint.UUCP (Geoff Kuenning) writes:
.numbers bigger that 64K (accounting?  what's that?).  And it also completely
.ignores the 6809, which was pretty successful in its own right.

How did sales for the 6809 compare with the 808X family? Anyone know?
-- 
Phil Ngai, {ucbvax,decwrl,allegra}!amdcad!phil or amdcad!phil@decwrl.dec.com

caf@omen.UUCP (Chuck Forsberg WA7KGX) (04/22/87)

In article <615@mipos3.UUCP> kds@mipos3.UUCP (Ken Shoemaker ~) writes:
:
:of course, if Mot stopped making all of those 680?0 processors compatible with
:the 6800, I guess that you might have an argument.  What is that I hear you
:say?  The 68000 isn't compatible with the 6800?  Gee, I guess Intel will
:just have to call back all the 80?86s that run 8080 code, which is exactly
:0 of them.  Blah, blah, blah...Now NEC seems to have picked up the ball by
:making their V20 and V30 run 8080 code in addition to the 8086 code, if
:that is what you really want...

Does this mean that Intel now accepts the V20 and V30 processors?????

While we're talking about COMPATIBILITY, when will that 386 chip
drop-in replacement for the 286 (described in PC-WEEK and elsewhere)
be available to rescue 286 boxes from obsolesence?   How fast a clock
rate will it accept?

hugo@gnome.cs.cmu.edu.UUCP (04/22/87)

In article <16294@amdcad.AMD.COM> phil@amdcad.UUCP (Phil Ngai) writes:
>
>How did sales for the 6809 compare with the 808X family? Anyone know?
>-- 

Well, I think the 6809 sold pretty well if only because Tandy used in in
their color computer line of little boxes.

So, I would say it was pretty successful, but not so big as the 808X,
unfortunately.

Pete
--
ARPA: hugo@cmu-cs-gandalf.arpa      BELL:412-681-7431
UUCP: ...!{ucbvax,ihnp4,cmucspt}!hugo@cmu-cs-gandalf.arpa
USPS: 5170 Beeler St., Pittsburgh PA 15213
QUOT: "What's that I smell? I smell home cooking.  It's only the river!"
			_ Talking Heads



-- 
ARPA: hugo@cmu-cs-gandalf.arpa      BELL:412-681-7431
UUCP: ...!{ucbvax,ihnp4,cmucspt}!hugo@cmu-cs-gandalf.arpa
USPS: 5170 Beeler St., Pittsburgh PA 15213
QUOT: "What's that I smell? I smell home cooking.  It's only the river!"
			_ Talking Heads

toma@tekgvs.UUCP (04/23/87)

In article <652@desint.UUCP< geoff@desint.UUCP (Geoff Kuenning) writes:
<In article <932@intsc.UUCP< tomk@intsc.UUCP (Tom Kohrs @fae) writes:
<
<< Not at all!  The 8086 architecture was a real nice way of extending the 
<< address range back in the days when the dominate machine was the 8085 and
<< Z-80.  Swapping segment registers to change address was so much nicer than
<< doing bank selections or overlays.  The 286 architecture is considerably
<< nicer than what was the dominate 16bit mini at the time, the PDP-11.  In
<< 1980 the idea that you could get the cpu power of an 11/70 on a chip with
<< a cleaner memory management model got a lot of people excited.  
<
<What a blatant misrepresentation:
<
<    (1) The PDP-11 is a 1960's architecture.  The 8086 was designed in the
<	late 70's, *after* the publication of Gordon Bell's book where he
<	states, "the most embarassing thing about the PDP-11 was that, only
<	two years after its introduction, we found it necessary to widen
<	the processor address."  But did Intel learn from DEC's mistake?

The PDP-11 came out in 1971 (which makes it a 60's architecture like the
PDP-10 and IBM 360), and was (and probably still is) the 
dominate minicomputer by 1980.  The initial PDP-11 had a 64k address space.
The initial 80x86 had a 1meg address space.  

<    (2) The *only* place where the PDP-11 loses to the 8086 is in the 8086's
<	clever use of instruction prefixes to provide bank switching (that's
<	all segments are).  The PDP-11 doesn't have any registers with
<	wierd private characteristics.  By contrast, even on the 80386, if
<	you want to shift by a variable shift count, it *has* to be in a
<	particular register.  How flexible.

Integer Divides and multiplies (on models that had them) had to use even/odd
register pairs.  MMU registers were memory mapped (which I would call "wierd".
Early PDP-11s could not shift more than one bit per instruction.  The 80x86
instruction set has more serious register limitiations than the variable
shift count (which is rarely used).  Even worse are its dedicated use of SI,
DI, and AX for the string instructions.

Segments can be used for memory protection/management and provide a convenient
way to pass data between processes on 80286 or 80386 processors.  The
unfortunate feature of segments is that the have no use in a UNIX environment.

The PDP-11 looses to the 8086 in price/performance.  It also suffers from
several non-compatible hardware floating point units and two types of hardware
integer multiply divide units.

<    (3) Although it is true that the 808x series was the dominant
<	MICROPROCESSOR CHIP (not the dominant machine, by any stretch of
<	the imagination), it is also true that there were *lots* of better
<	architecture examples around.  The 6800 literally leaps to mind.

The 6800 was better for numeric calculation, but not as good for moving data
around.  But the 6502 and the 6809 were better yet!  Unfortunately look at
the sales figures.

<    (4) Tom also conveniently ignores what a loser the 8085 was.  Essentially
<	no changes from the 8080.  Check out the 6809 by contrast, or compare
<	the PDP-11/70 with the original 11, the 11/20.

Sure the instruction set was no better (go to a Z-80 for that), but the 8085
did reduce the system chip count -- it was mainly better from a hardware
point of view.


<    (5) The basic reason for the 8086's kludgey architecture was [...]
<	because Intel saw binary compatibility with the 8080/8085 as a 
<       critical goal.  That's
<	forgivable;  nobody at the time could have seen that MS/DOS was
<	going to kill CP/M-86.  But contrast DEC's approach with the VAX.
<	They got compatibility with no kludges, and their customers don't
<	seem too unhappy.

What binary compatibility??  It was sort of compatible from source, if you
ran it through a conversion program.  The result was sluggishly running
8086 programs (such as WordStar which on a good Z80 machine would run rings
around the 8086 version on an IBM-PC).  MS-DOS was a closer CP/M-80 clone
than CP/M-86 since only the former supported CP/M 80 style system calls
(that is what CALL 5 is for).  DEC's approach with the VAX matches that of
NEC's V20/30 chip which does maintain compatibility, and I only wish came
out five years sooner (and emulated Z80 as well).

<
<< What caused most of the frustration toward the 286 was DEC and Motorola 
<< both went to a 32bit programming model at that time. Programmers quickly 
<< jumped to arms to adhere to the old maxim of using all of the available 
<< memory plus one byte. When these neat new programs (ie BSD 4.x) were 
<< forced back down to the 16 bit architecture things got tricky.  Many 
<< programmers decided that programming in a 32 bit environment required
<< less effort and less need for structure than the 16 bit environment and
<< so to justify their not liking to work on 16 bit machines they were labled
<< as being kludges or obsolete.
<
<[...]I guess it's just "lazy programming" that kept me from doing arbitrary
<complexity within 64K...

OK, two examples of lazy C programming are 1). the assumption that 
sizeof(int)==sizeof(char *), 2)Using int instead of long for numbers that
need to be longer than 16 bits.


Tom Almy
Tektronix, Inc.

(If I had a choice, I would still be doing assembly languange programming
on a PDP-11, with the best intruction set of all time, rather than a 
680x0 or 80x86, both of which have more kludges than I would have time
to talk about.)

doug@edge.UUCP (Doug Pardee) (04/23/87)

Picking nits:

> 	The PDP-11 doesn't have any registers with
> 	wierd private characteristics.

I'm no expert on the 11, but aren't registers 6 and 7 the Program Counter and
Stack Pointer?  Not that this voids the argument -- the '86s are much more
restrictive with their weird registers.

> 	... Intel saw
> 	binary compatibility with the 8080/8085 as a critical goal.  That's
> 	forgivable;  nobody at the time could have seen that MS/DOS was
> 	going to kill CP/M-86.

Actually, the 8080 compatibility was *more* crucial for MS/DOS than for
CP/M-86.  MS/DOS was designed from the outset to be 100% upward compatible
with good-ol' CP/M, so that CP/M programs could be mechanically translated
from 8080 code to 8086 code and they'd run.  (It worked, too.  Many of the
early PC programs were mechanically translated CP/M programs.)  For some
reason, DRI didn't consider compatibility to be important, and CP/M-86
wasn't upward compatible.

-- Doug Pardee -- Edge Computer Corp. -- Scottsdale, Arizona

geoff@desint.UUCP (04/24/87)

In article <615@mipos3.UUCP> kds@mipos3.UUCP (Ken Shoemaker ~) writes:

> of course, if Mot stopped making all of those 680?0 processors compatible with
> the 6800, I guess that you might have an argument.  What is that I hear you
> say?  The 68000 isn't compatible with the 6800?  Gee, I guess Intel will
> just have to call back all the 80?86s that run 8080 code, which is exactly
> 0 of them.

Without an 8080 manual handy, I can't really easily check to see if the
386 is binary-compatible with the 8080.  But it certainly is
assembly-compatible, and I can't help noting that there are so-called
"short forms" of instructions that look suspiciously like the opcodes
of the 8080 set.  By contrast, try writing "LDX" on a 68000.  Doesn't work.

Why is it that Intel employees are so confused about their own products?
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

ihm@nrcvax.UUCP (Ian H. Merritt) (04/24/87)

>In article <16294@amdcad.AMD.COM> phil@amdcad.UUCP (Phil Ngai) writes:
>>
>>How did sales for the 6809 compare with the 808X family? Anyone know?
>>-- 
>
>Well, I think the 6809 sold pretty well if only because Tandy used in in
>their color computer line of little boxes.
>
>So, I would say it was pretty successful, but not so big as the 808X,
>unfortunately.
>

Unfortunately indeed.  The 6809 was just a bit too late for its
market.  Had it come out 2 years earlier, MANY things would have been
different.  It was VASTly superior to the 8080/Z80 which dominated the
8-bit market at the time.  Architecturally, it is more consistent and
generally cleaner than the x86 garbage, but limited by its 64K address
limitations.  The x86 at least expanded that slightly, albeit by a
kludgy and almost unuable mechanism.  This made a difference though in
the choice of a CPU for the MS-DOG machines. (Woof!).

Pity things have taken the low road, but I perceive a trend toward a
somewhat improved future.  Maybe I am just dreaming.

Here's to a future without Intel (or at least their current
philosophy)...

Cheerz--
					<>IHM<>

chapman@fornax.uucp (John Chapman) (04/25/87)

>
[ comparison of 11's and 80x8y's ]
> 
> Integer Divides and multiplies (on models that had them) had to use even/odd
> register pairs.  MMU registers were memory mapped (which I would call "wierd".
> Early PDP-11s could not shift more than one bit per instruction.  The 80x86
> instruction set has more serious register limitiations than the variable
> shift count (which is rarely used).  Even worse are its dedicated use of SI,
> DI, and AX for the string instructions.

This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
(probably a lot of others) as well. Each dedicates particular registers
for specific (count, dst., src.) functions in string instructions. It
is the only way to have single instruction string operations that are
interruptible and resumable (which you obviously want) other than
perhaps putting the internal (non user visible) microcode registers
on the stack *every* time an interrupt happens (*yuck*).

.
.
.
 
> Tom Almy
> Tektronix, Inc.
> 
> (If I had a choice, I would still be doing assembly languange programming
> on a PDP-11, with the best intruction set of all time, rather than a 
> 680x0 or 80x86, both of which have more kludges than I would have time
> to talk about.)
Yup.

john chapman

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

almquisk@rpics3b.UUCP (04/26/87)

> What caused most of the frustration toward the 286 was DEC and Motorola 
> both went to a 32bit programming model at that time. Programmers quickly 
> jumped to arms to adhere to the old maxim of using all of the available 
> memory plus one byte. When these neat new programs (ie BSD 4.x) were 
> forced back down to the 16 bit architecture things got tricky.  Many 
> programmers decided that programming in a 32 bit environment required
> less effort and less need for structure than the 16 bit environment and
> so to justify their not liking to work on 16 bit machines they were labled
> as being kludges or obsolete.

Did Intel simply fail to notice what was happenning to the price of memory?
16 bit machines are fine if you can't afford more than 64K of memory anyway.
The VAX and the 68000 made it possible to take full advantage of the cheaper
memory, but these days DEC sells memory for the PDP-11 in 1 Mbyte chunks.

It is true that 32 bit environments can save on programming effort, which is
very important these days since programming costs tend to exceed hardware
costs.  But that is only half the story.  I own an old 68000 box, and the
editor I use simply reads files being edited into space obtained by malloc.
If I modified the editor to keep the files being edited on disk, the result
would be *slow* because the disk has an 85 millisecond average access time.  
In the early 70's it was necessary to either shell out the money for fast
disk drives or else live with slow editors, but today it is possible to buy
lots of cheap RAM to speed up editing--if your CPU was designed to support
it.

> The 286 architecture is considerably nicer than what was the dominate
> 16bit mini at the time, the PDP-11.

But of course when the 286 came out the dominant mini of the time was the
32 bit VAX.  From the point of view of a person accustomed to working with
larger computers, the only really interesting chip that Intel has come out
with is the 432.  The problem with the 432 is that capability based systems
are still not very well understood.  It might be possible to build a nice
system based upon the 432, but today the idea is to build UN*X boxes and
you don't need a 432 to do that.
					Kenneth Almquist

geoff@desint.UUCP (Geoff Kuenning) (04/28/87)

In article <678@edge.UUCP> doug@edge.UUCP (Doug Pardee) writes:

> Picking nits:
> 
> > 	The PDP-11 doesn't have any registers with
> > 	wierd private characteristics.
> 
> I'm no expert on the 11, but aren't registers 6 and 7 the Program Counter and
> Stack Pointer?  Not that this voids the argument -- the '86s are much more
> restrictive with their weird registers.

Well, technically that's true.  The PDP-11 PC has the "special characteristic"
that it autoincrements after an opcode fetch (not, however, after an
operand-address or immediate-operand fetch -- see below).  The SP has two
special characteristics:  the JSR/RTS instructions use it as in implied
register, and the interrupt/RTI operations do the same.

As to the PC, DEC has a patent on "PC as a general register".  By making
the PC just another general register, you lose one register, but gain
a whole bunch in instruction-encoding simplicity.  For example, an immediate
operand is handled by just encoding the operand as (PC)+ -- the operand
is plucked from where the PC points, and the PC increments over it to
the next instruction stream element, nice as you please.  I predict that
when DEC's patent expires, you will see this feature in a lot of other
computers.

However, it is worth noting that *every* computer has "special registers"
of various sorts.  Somebody at Tek mentioned MMU registers, which I consider
a red herring.  There's also the PC and the PSW.  My point was the Intel's
GENERAL registers have special characteristics, and thus aren't really
general.
-- 
	Geoff Kuenning   geoff@ITcorp.com   {hplabs,ihnp4}!trwrb!desint!geoff

ihm@nrcvax.UUCP (Ian H. Merritt) (04/28/87)

>>
>[ comparison of 11's and 80x8y's ]
>> 
>> Integer Divides and multiplies (on models that had them) had to use even/odd
>> register pairs.  MMU registers were memory mapped (which I would call "wierd".
>> Early PDP-11s could not shift more than one bit per instruction.  The 80x86
>> instruction set has more serious register limitiations than the variable
>> shift count (which is rarely used).  Even worse are its dedicated use of SI,
>> DI, and AX for the string instructions.
>
>This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
>(probably a lot of others) as well. Each dedicates particular registers
>for specific (count, dst., src.) functions in string instructions. It
>is the only way to have single instruction string operations that are
	^^^^
>interruptible and resumable (which you obviously want) other than
>perhaps putting the internal (non user visible) microcode registers
>on the stack *every* time an interrupt happens (*yuck*).

The ONLY way?  Really?  First off, it is not clearly desirable to have
single instruction dedicated string operations, particularly if the
processor can execute a short loop just as fast (or faster), the
latter providing far greater flexibility.  If for some reason, you
must have a single instruction, though, it is perfectly reasonable to
have the instruction SPECIFY the source, destination, and count
registers (or whatever operands the instruction requires), thereby
allowing ANY GENERAL PURPOSE REGISTER (not something found on Intel
processors) to be used for any operand.

On another note, wrt the PDP-11 multiply/divide stuff, I was
particularly disturbed when they decided to have a mixed-endian model.
Basically, the PDP-11 is a little-endian processor where the low-order
byte of a word is in the lower numbered address, and bits are numbered
as powers of two.  When they had to store 32 bit numbers, for
multiplies and divides, they decided to deviate from this basically
consistent philosophy, storing the high order word in the lower
numbered address; the low word in the higher, but within these words,
the bytes are stored the other way.  Terriffic.  The reasons for this
have oft been discussed, and I do not wish to spark another such
conversation, but it sure was confusing.


Cheerz--
							<>IHM<>

chapman@fornax.uucp (John Chapman) (05/02/87)

.
.
<mild flames ahead>
> >> shift count (which is rarely used).  Even worse are its dedicated use of SI,
> >> DI, and AX for the string instructions.
> >
> >This is not peculiar to Intel as NSC 32xxx and Dec VAX machines do it 
> >(probably a lot of others) as well. Each dedicates particular registers
> >for specific (count, dst., src.) functions in string instructions. It
> >is the only way to have single instruction string operations that are
> 	^^^^
> >interruptible and resumable (which you obviously want) other than
> >perhaps putting the internal (non user visible) microcode registers
> >on the stack *every* time an interrupt happens (*yuck*).
> 
> The ONLY way?  Really?  First off, it is not clearly desirable to have

Yes the only way to have : single instruction, interruptible, and
resumable string instructions without putting extra microcode internal
registers on the stack at interrupts.  Yes it is obviously desirable
for string instructions to be resumable.

If you have to stack extra internal state then:

 1. either you only stack the stuff when a string instruction is
    interrupted so your stack state on interrupt is dependant on
    the instruction being interrupted - this is undesirable, or

 2. you always stack that information whether it's needed or not
    introducing a general overhead to interrupts to accomodate
    the occasional string instruction.

> single instruction dedicated string operations, particularly if the
> processor can execute a short loop just as fast (or faster), the
> latter providing far greater flexibility.  If for some reason, you

1. how would you code a string instruction like FFS (find first set bit)
   to fit in something like the 68010's single instruction looping
   feature/cache?

2. with good code cacheing and instruction pipelining you *might*
   get the same performance with a multi-instruction equivalent
   to the simpler string instructions; cases where single instructions
   execute slower than multi-instruction equivalents are more
   likely to be indicative of other problems, e.g. poor micro-coding
   of the single instruction.

> must have a single instruction, though, it is perfectly reasonable to
> have the instruction SPECIFY the source, destination, and count
> registers (or whatever operands the instruction requires), thereby
> allowing ANY GENERAL PURPOSE REGISTER (not something found on Intel
> processors) to be used for any operand.

But it *isn't* perfectly reasonable.  The guys who design these chips
aren't stupid you know - they do this stuff for a reason.  A good
pipelined cpu will have simultaneous opcode and operand decode.  The
architecture is optimized around a central issue like this; a two
address machine has it's silicon designed to handle one or two operand
decodes efficiently.  So along comes a string instruction with 3 or 4
operands - now what do you do? If you want full operand generality
then the chip has to have real estate dedicated to handle these (few)
exceptional instructions with extra operands.

Why is it necessary to have full generality for these instruction
operands?  You know in advance what you will be using a calculation
result for so do it in the register it needs to be in. You have all
the other registers to do what you want with so why is it a problem
that (for example) the vax uses low numbered registers for string
operands? What exactly does it prevent you from doing?

Also part of the reason I posted my original response was that you
were flaming at Intel fro dedicating certain registers for certain
functions. I have two problems with this:
 1. as originally pointed out this is hardly unique to Intel so
    if you are going to flame about it you better flame everybody
    else who does it too - unless of course you are biased.
 2. most people present this as "well you have to use SI as string
    source register" (implying that that is the only use for SI
    or DI, or BX or BP), rather than "well you can do general arithmetic
    etc. on SI and if you want to do a string instruction you use
    SI as the source operand".
 
.
.
. 
> 
> Cheerz--
> 							<>IHM<>

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

dillon@CORY.BERKELEY.EDU (Matt Dillon) (05/03/87)

>1. how would you code a string instruction like FFS (find first set bit)
>   to fit in something like the 68010's single instruction looping
>   feature/cache?

	moveq.l	#31,D0
	move.l 	ea,D1
loop	asl.l	#1,D1
	dbcs 	loop

    Notes: You could just as easily specify 7, and use byte operations to 
    find the first set bit in a byte.  The routine above checks bits MSB
    first.  You can, of course, use lsr instead to check bits LSB first.
    (remember to take 32 - the count afterword for longword operations)
    You can add an outer loop to check any size item by using
    (A0)+ for the EA on the MOVE (and setting up A0) when loading the 
    initial data into D1, followed by, say, a BEQ if you also want to check for
    an end of string.  Remember that the outer loop would only execute
    once every 32 cached loops (97% of the instructions are cached) assuming
    you use longword operations.

    And, of course, it would all fit into a 68020's cache.  The DBcc 
    instruction quite versitile.  Even though the 68000/68010's DBcc 
    (not sure about 68020) only supports word decrements on the data 
    register, it is quite simple to support 32 bit counts with the addition
    of an outer loop (another two instructions... a subi.l, and a branch).
    Maximum efficiency would give 65536 inner loops for every 1 outer loop,
    or a .0015 PERCENT loss over a DBcc which used the full 32 bits of the
    data register.

    P.S. I didn't test this code... but it's almost a children's exercise
    anyway.

					-Matt

kds@mipos3.UUCP (Ken Shoemaker ~) (05/04/87)

and, of course, the win of having special registers for certain operations
is that operations with those registers can be made faster than in normal
circumstances.  For example, in the 386 (and 286) there is special hardware
in the bus interface unit that is used in the string move instruction to
effect string transfers at the bus bandwidth.  Trying to do this in a general
instruction loop would be difficult in that it would require looking at a
couple of instructions and decoding them as a group.

Also, with respect to specifying operands, the decoding of the instruction
itself is an important part of the execution of the instruction.  In the
8086 there isn't a way to specify any more than two different operands, and
if you specify two operands, one of them is a register.  Thus, for a string
move instruction, you'd be at a loss to try to specify all the operands
you need explicitly unless you increase the complexity of the instruction
decode.
-- 
The above views are personal.

...and they whisper and they chatter, but it really doesn't matter.

Ken Shoemaker, Microprocessor Design, Intel Corp., Santa Clara, California
uucp: ...{hplabs|decwrl|amdcad|qantel|pur-ee|scgvaxd|oliveb}!intelca!mipos3!kds
csnet/arpanet: kds@mipos3.intel.com

mash@mips.UUCP (John Mashey) (05/05/87)

In article <274@fornax.uucp> chapman@fornax.uucp (John Chapman) writes:
> ...Discussion of dedicated registers for string ops...
>
>Why is it necessary to have full generality for these instruction
>operands?  You know in advance what you will be using a calculation
>result for so do it in the register it needs to be in. You have all
>the other registers to do what you want with so why is it a problem
>that (for example) the vax uses low numbered registers for string
>operands? What exactly does it prevent you from doing?

In general, it's an OK tradeoff.  What it does make harder is the use
of more serious optimizing compilers.  In general, it is much easier
for ones that do good global optimization to compute results into
whatever registers are convenient.  Special cases can be lived with,
but the more there are, the harder it gets.  Compiler writers have
always disliked things like:
	register pairs needed for certain operations
	special registers used by some instructions
	unnecesarily assymetric register sets
Again: not a disaster, and maybe a correct tradeoff, given all of the
other assumptions alrady builtin, but not what one would like.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash, DDD:  	408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

herndon@umn-cs.UUCP (05/05/87)

In article <365@winchester.UUCP>, mash@mips.UUCP (John Mashey) writes:
> In article <274@fornax.uucp> chapman@fornax.uucp (John Chapman) writes:
> > ...Discussion of dedicated registers for string ops...
> >
> >Why is it necessary to have full generality for these instruction
> >operands?  You know in advance what you will be using a calculation
> >result for so do it in the register it needs to be in. You have all
> >the other registers to do what you want with so why is it a problem
> >that (for example) the vax uses low numbered registers for string
> >operands? What exactly does it prevent you from doing?
> 
> In general, it's an OK tradeoff.  What it does make harder is the use
> of more serious optimizing compilers.
>     . . . . . . . . . . . . . . . . . . . .  Compiler writers have
> always disliked things like:
> 	register pairs needed for certain operations
> 	special registers used by some instructions
> 	unnecesarily assymetric register sets
> Again: not a disaster, and maybe a correct tradeoff, given all of the
> other assumptions alrady builtin, but not what one would like.

  I'll second John on this.  Having used an early Intermetrics
optimizing compiler for the 8086, I truly respect those people
who had to write the optimizer for it.  In spite of very good
register allocation/instruction optimizations, there were so many
"gotcha's" in the processor that some of them inevitably got
through.  Among them were (from memory):
  1) AX was the only register usable with I/O instructions, and
     many instructions insisted on having one operand in AX,
     e.g., XLAT.
  2) BX was the only register usable as a stack offset
  3) CX always held the count for shift and string operations.
  4) DX got high-order results from the multiply, one of the
     multiply ops had to be in AX, and the low-order result
     was in AX.  (Sometimes this clobbered something.)
  5) Some of the instructions did not permit segment over-ride
     prefixes (still executed, but the instruction didn't work as
     anticipated.)
  6) Not all instructions set the condition codes as might be
     expected.
  While any one of these can easily (maybe) be anticipated, the
effect of so many is to overwhelm the poor optimizer writer.
EVERY SINGLE REGISTER on the 80X86 series has one or more special 
uses.  I readily admit that only one of the problems we found
was not documented in the processor manuals (the POPF instruction
had problems), but having soooooooooo many exceptions made the
advertising concept of "8086 general registers" an oxymoron.

-- 
Robert Herndon				Dept. of Computer Science,
...!ihnp4!umn-cs!herndon		Univ. of Minnesota,
herndon@umn-cs.ARPA			136 Lind Hall, 207 Church St. SE
herndon.umn-cs@csnet-relay.ARPA		Minneapolis, MN  55455

johnw@astroatc.UUCP (John F. Wardale) (05/08/87)

In article <639@mipos3.UUCP> kds@mipos3.UUCP (Ken Shoemaker ~) writes:
	... about specialized string instructions ...
>  For example, in the 386 (and 286) there is special hardware
>in the bus interface unit that is used in the string move instruction to
>effect string transfers at the bus bandwidth.  Trying to do this in a general
>instruction loop would be difficult in that it would require looking at a
>couple of instructions and decoding them as a group.

No, Ken, your looking at this the wrong way.   You are ABSOLUTELY
right that this will be memory (bus) limited.  However, if you can
issue instructions fast enough (by having a fewer, simple
instructions) then a loop can also saturate the memory bus.

(This assumes an I-cache.  If you have to fetch each instruction
on a single memory bus, this will obviously lose!)

Average string length, data caches, MMU's etc. will complicate the
analysis.  Note that a page-fault in the middle of a single instr.
means restarting it, but in the loop case, you just re-issue the
instruction.  This simplification may also help you decrease you
processor's cycle time!

(In case you haven't guessed ... I like RISC)


			John W

- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - 
Name:	John F. Wardale
UUCP:	... {seismo | harvard | ihnp4} !uwvax!astroatc!johnw
arpa:   astroatc!johnw@rsch.wisc.edu
snail:	5800 Cottage Gr. Rd. ;;; Madison WI 53716
audio:	608-221-9001 eXt 110

To err is human, to really foul up world news requires the net!

chapman@fornax.UUCP (05/08/87)

> >1. how would you code a string instruction like FFS (find first set bit)
> >   to fit in something like the 68010's single instruction looping
> >   feature/cache?
> 
> 	moveq.l	#31,D0
> 	move.l 	ea,D1
> loop	asl.l	#1,D1
> 	dbcs 	loop
> 
>     Notes: You could just as easily specify 7, and use byte operations to 
>     find the first set bit in a byte.  The routine above checks bits MSB
>     first.  You can, of course, use lsr instead to check bits LSB first.
>     (remember to take 32 - the count afterword for longword operations)
>     You can add an outer loop to check any size item by using
>     (A0)+ for the EA on the MOVE (and setting up A0) when loading the 
>     initial data into D1, followed by, say, a BEQ if you also want to check for
>     an end of string.  Remember that the outer loop would only execute
>     once every 32 cached loops (97% of the instructions are cached) assuming
>     you use longword operations.
> 
.
. 68020 comments
. 
>     P.S. I didn't test this code... but it's almost a children's exercise
>     anyway.
> 
> 					-Matt

Yes, well, the problem with children is they often make mistakes.

1. The code you have produced does not meet the original "challenge"
   I made to the original poster who was wondering why specific
   registers and why a single instruction.  Your code does not all
   fit in the 68010's cache.  Lest you think this is picky I will point
   out that I chose the simplest of the string instructions as an example;
   others would take even more code and use more registers.

2. Duplicating the function has now used up a general purpose register.
    One of the advantages of the "all in one" type instruction is that
   internal temporaries can be used.

3.  You have not implemented FFS, you have implemented a much simpler version.
    FFS is (operand order may be wrong):

    FFS <base addr>,<offset>,<length>,<destination of count>

    <base addr> is a byte address of the bit string origin
    <offset> is a bit offset into the bit string at which the search
             begins; if the source is not a register this may be more than 32
             bits
    <length> is the length of the field to be checked in bits
    <destination of count> is where the bit position of the first set bit
                           is placed

*** REPLACE THIS LINE WITH YOUR MESSAGE ***
-- 
{watmath,seismo,uw-beaver}!ubc-vision!fornax!sfulccr!chapman
                   or  ...!ubc-vision!sfucmpt!chapman

allen@ccicpg.UUCP (05/12/87)

>>1. how would you code a string instruction like FFS (find first set bit)
>>   to fit in something like the 68010's single instruction looping
>>   feature/cache?
> 	moveq.l	#31,D0
> 	move.l 	ea,D1
> loop	asl.l	#1,D1
> 	dbcs 	loop
> 



A minor point.  According to "M68000 16/32-Bit Microprocessor", Programmer's
Reference Manual, Fourth Edition, page 218, Table G-1:
	asl.l	#1,D1
is not listed as a loopable instruction.

allen