[comp.arch] The 360 was a design landmark

johnw@astroatc.UUCP (John F. Wardale) (01/01/70)

In article <1193@k.gp.cs.cmu.edu> lindsay@k.gp.cs.cmu.edu (Donald Lindsay) writes:
>you want. In fact, the 360 is something of a RISC machine by latter
>standards.
Whoooooo!!!!!   The 360/370 has some instructions that almost
define the limits for a "CISC"

On your side...see below

>As for 12 bit offsets: ...increased code density.
>...  The VAX carried this same idea further.

>To reiterate, it (the 360) changed the world, for the better. Be kind.

So...What did the Vax get for all its architectual contortions.
Dare I start a 370-vs-Vax debate??

The 360 designers saw fit (or did they just guet luckie...I don't
think so!) to design *PIPELINING-CAPABILITY* into the 360.
12-bit offsets, EBCDIC, and IBM-marketing asside, THE *MAJOR*
reason the ancint 360/370 stuff is still alive, while DEC's vaxen
are falling by the wayside (despite DES's best efforts) is that
360's *CAN* be pipelined (tho not necessarily real easily) and
VAXen can't!  

The 1st byte of each 370 instruction tells you the length of the
instruction!  On a VAX, you have to gather *ALL* the operands
(0, 1, 2, or 3!) before you can find the start of the next
instruction!

Top 370 designes top out at 20 MIPS (or is it more now?)
Top Vax design is maybe 8 MIPS.

The point is that despite moderate-size misteakes, the 360 is a
relatively OK design, and was fanaminally excellent for the early
1960's!!!!!!!!

(And I'm definataly a non-IBM type person!)

-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

lamaster@pioneer.arpa (Hugh LaMaster) (01/01/70)

In article <18088@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes:

>If the VAX instruction-set was designed for "maximum code density", they
>certainly did a poor job.  Many processors (including some "RISCs" --
>IBM ROMP and CRISP) can routinely beat it in code density.

Well, it was designed for high code density at the time. Two points:
1)  Maybe people have learned something in 10 years, and
2)  Be careful to compare oranges and oranges.  I think the code density
produced by the VMS compilers is quite good, and compares favorably with
code produced on other 32 bit architectures with good compilers.  It is
a different test to compare the code produced by pcc, for example.  What
comparisons were you referring to?

  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

                 "IBM will have it soon"

(Disclaimer: "All opinions solely the author's responsibility")

msf@amelia (Michael S. Fischbein) (01/01/70)

In article <26623@sun.uucp> petolino@sun.UUCP (Joe Petolino) writes:
>One final word about the ASCII vs EBCDIC debate.  You can enter ANY of the
>128 ASCII codes from a standard ASCII keyboard.  I don't know of any
>EBCDIC keyboard that can make a similar claim.

Whoa!  There are 128 7-bit numbers, and 256 8-bit numbers.  If you have a
keyboard that generates 128 different 7-bit codes, adding one key (usually
called `Meta') will generate all 256 8-bit codes.  That has to be all of
EBCDIC; there just isn't any more.  Not all terminals have Meta keys; but
ask anyone who uses EMACS if there aren't work-arounds if you don't.

		mike
Michael Fischbein                 msf@prandtl.nas.nasa.gov
                                  ...!seismo!decuac!csmunix!icase!msf
These are my opinions and not necessarily official views of any
organization.

lindsay@k.gp.cs.cmu.edu (Donald Lindsay) (08/24/87)

THe discussion to date is typified by the uninformed statement:
>Yes, I did mean 12-bit offsets, and they can be a nuisance and not an
>insuperable problem, and may (or may not) have made sense given the
>design contraints of the day. 

The 360 quite simply made everything that came before, obsolete. It had an
unthinkably big address space ( I was there: I didn't think it). It had
a genuine interrupt structure - look at the CDC 6600, whose PPUs couldn't
get the big guy's attention. It had byte addressability. It had the most
orthogonal instruction set to date. It had general registers, without
silly special-casing all over the place. It didn't have silly holes in its
instruction repertoire ( unlike a machine I won't name, that didn't have
subtract, or two's complement, so you did A-B as
	load B,	one's complement,increment,add A 
I kid you not). It had two sizes of floating point. It was the first family:
before that, switching CPUs *always* meant switching instruction sets, and
switching disks, and even printers, and cables. The 360 family offered
printers (etc) that attached to all (well, sometimes, most) of the family.
A breathtaking concept !

I could go on.

It was not unflawed, but by comparison, the competiton had leprosy. The 
mistakes are well known, and have not been mentioned by this group.

One mistake was in the floating format. It was field-updated, free.

Another mistake was the floating roundoff (nibble, not bit). Gene Amdahl
hoped to reduce hardware complexity, but mostly made it hard to determine
the arithmetic error of various computations.

Another mistake was in not introducing limit registers, or some other 
primitive variant of virtual memory. (Univac had limit registers.) They
thought of it at the time, and decided that they were already stretching the
company to the breaking point. (They were designing all the family members,
and all the other components, at once.) Well, hindsight is so 20-20.

Another mistake was in not having a conditional branch instruction that was
PC-relative.  I don't happen to know if competitors had this at that time.
If not, then IBM should have retrofitted it when the idea did come up.
The reason: bad code density vs. more recent machines. This single change
would have made 15%-25% difference to certain popular applications.

Another mistake was that when you loaded an address, the top 8 bits weren't
cleared. I don't know if this was a feature or an oversight (at the time).
When the architecture was about 16 years old, it became a serious problem.
That's a long time.

ASCII was a bad choice because all of their customers had committment to
other codes. (By committent, I mean card readers and sorters: terminals hadn't
been invented.) The fact that ASCII was mentioned in the Priniciples of
Operation manual, showed their progressiveness. (THEY owned the card
equipment market.)

As for stacks: you can simulate them just fine. I refer doubters to the
RISC literature, which proposes simple machines, where you simulate what
you want. In fact, the 360 is something of a RISC machine by latter
standards.

As for 12 bit offsets: this was an innovation that increased code density.
The idea was that you didn't have to deal in full-size addresses: when
fewer bits would do, you could use 12 ! You don't have to use them (unlike
the 8086's 64K segments). The VAX carried this same idea further.

To reiterate, it changed the world, for the better. Be kind.
-- 
	Don		lindsay@k.gp.cs.cmu.edu    CMU Computer Science

guy%gorodish@Sun.COM (Guy Harris) (08/25/87)

> The 360 designers saw fit (or did they just guet luckie...I don't
> think so!) to design *PIPELINING-CAPABILITY* into the 360.

I dunno about that; when did the first pipelined machines come out?

> 12-bit offsets, EBCDIC, and IBM-marketing asside, THE *MAJOR*
> reason the ancint 360/370 stuff is still alive, while DEC's vaxen
> are falling by the wayside (despite DES's best efforts)

VAXes falling by the wayside?  I dunno about that, either.  DEC, VAXes, and VMS
seem to be doing quite well for themselves.

> is that 360's *CAN* be pipelined (tho not necessarily real easily) and
> VAXen can't!  

VAXes can't be pipelined?  Gee, I suspect some of the 8600's designers would be
surprised to hear that.

> Top 370 designes top out at 20 MIPS (or is it more now?)
> Top Vax design is maybe 8 MIPS.

So?  By itself, that may merely indicate that DEC hasn't pushed raw hardware
technology as hard as IBM has.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

cik@l.cc.purdue.edu (Herman Rubin) (08/25/87)

A big flaw in the 360-style architecture, which is present in all of its
successors of which I am aware, is that there is no communication between
the integer and floating registers.  I know of no other machines on which
it is as difficult and time consuming to convert between integer and
floating point.  I was also told 22 years ago by an IBM research physicist
that this problem had been pointed out to the design engineers _before_ the
design was fixed; and if one looks at the design at that time, there were
lots of unassigned instructions in the floating point part of the code.
-- 
Herman Rubin, Dept. of Statistics, Purdue Univ., West Lafayette IN47907
Phone: (317)494-6054
hrubin@l.cc.purdue.edu or pur-ee!stat-l!cik or hrubin@purccvm.bitnet

daveb@geac.UUCP (Brown) (08/25/87)

In article <418@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>The 360 designers saw fit (or did they just get luckie...I don't
>think so!) to design *PIPELINING-CAPABILITY* into the 360... 
[other discussion of VAX vs 360] 
>(And I'm definataly a non-IBM type person!)

One of the other thing they (and the rest of the BUNCH) did with
their machines was ensure that there was a definite "radial" appearance
to the components instead of stringing them out on a bus.  It is
probably unfair to compare a Vax (Virtual Address eXtended '11) with
a mainframe for this reason.
  Now if DEC chose to mount the Vax order code and some fast memory in
a re-tooled '10 cabinet, we might be able to compare these particular
apples and oranges.

 --dave
-- 
 David Collier-Brown.                 {mnetor|yetti|utgpu}!geac!daveb
 Geac Computers International Inc.,   |  Computer Science loses its
 350 Steelcase Road,Markham, Ontario, |  memory (if not its mind)
 CANADA, L3R 1B3 (416) 475-0525 x3279 |  every 6 months.

johnw@astroatc.UUCP (John F. Wardale) (08/25/87)

In article <26444@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:
>> VAXen are falling by the wayside (despite DEC's best efforts)
>
>VAXes falling by the wayside?  I dunno about that, either.  DEC, VAXes, and VMS
>seem to be doing quite well for themselves.
Hoding, maybe....many "VAX-labs" are buying into non-DEC stuff
(MIPS, Sun, Pyramid, Sequent, et.al.)

>> is that 360's *CAN* be pipelined (tho not necessarily real easily) and
>> VAXen can't!  
>
>VAXes can't be pipelined?  Gee, I suspect some of the 8600's designers would be
>surprised to hear that.
The 8600 overlaps operand-decode with operand-fetch, and uses
multiple functional (execution) units, but **UNLIKE** IBM and any
other true pipe-line design, can *NOT* have multiple instructions
in the decode phase simultaniously!  
Why you ask?
Because the vax (unlike most others) encodes the instructions so
tightly that you can't find the next one until you're almost done
with the current one!!  This is due to operand-mode encoding,
the 200 some addressing, etc. etc. etc.

>> Top 370 designes top out at 20 MIPS (or is it more now?)
>> Top Vax design is maybe 8 MIPS.
>
>So?  By itself, that may merely indicate that DEC hasn't pushed raw hardware
>technology as hard as IBM has.
While the 370 is not nice, at least with it, the first
operand-byte **ALLWAYS** tells you where the next instruction
starts!  The vax could be *MUCH* faster if it did this too, but
then it would lose its code-density!

360:  designed for easy of implementation  (An idea common to RISC)
VAX:  designed for maximum code density    (Poor choise today!)
RISCs:designed for speed, thru simplicity

This is why a couple M$$ will buy a 20+ MIP IBM, or a sub-8 MIP
VAX, or gobs (20-80) of 10 MIP RISC-workstations  (Yes, this is
very much like comparing mopeds, pickup-trucks and 18-wheelers as
"cars", but it does show how design decisions effect speed!)

-- 
					John Wardale
... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw

To err is human, to really foul up world news requires the net!

darryl@ism780c.UUCP (Darryl Richman) (08/26/87)

In article <570@l.cc.purdue.edu> cik@l.cc.purdue.edu (Herman Rubin) writes:
>A big flaw in the 360-style architecture, which is present in all of its
>successors of which I am aware, is that there is no communication between
>the integer and floating registers.  I know of no other machines on which
>it is as difficult and time consuming to convert between integer and
>floating point.

Look at the 80*86 machines and their 80*87 coprocessors.  In most Unix
implementations, the coprocessor's rounding mode is by default set to
round to nearest.  When you want an integer from an fp, you must save the
current mode, set to truncate, store the integer into memory, and reset
the rounding modes.  You must then load the cpu register with the integer.
Branch on fp condition is similarly hampered, although there is an
instruction that will directly load the ax register from the coprocessor's
status word.

But this stuff is only a problem if you're doing serious scientific stuff.
I was under the impression that most 370s did mundane payroll/bookkeeping
stuff.  For those tasks, bcd string performance would seem to be paramount.

bcase@apple.UUCP (Brian Case) (08/26/87)

In article <418@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
>THE *MAJOR*
>reason the ancint 360/370 stuff is still alive, while DEC's vaxen
>are falling by the wayside (despite DES's best efforts) is that
>360's *CAN* be pipelined (tho not necessarily real easily) and
>VAXen can't!  

I beg your pardon, but your statement is quite a bit stronger than reality
will permit.  I, for one, believe that the high-end VAXs are quite
pipelined.

>The 1st byte of each 370 instruction tells you the length of the
>instruction!

You have pin-pointed one of the VAX's problems.  This does not prevent,
absolutely, pipelining.

>Top 370 designes top out at 20 MIPS (or is it more now?)
>Top Vax design is maybe 8 MIPS.

At what cost?  Those 20 MIPS machines are CONSIDERably more expensive and
use technology to which DEC doesn't have access (I suspect), even though
they might be able to get at it if they really wanted to do so (Fuji
is now selling the RAM chips on the open market, I think).  Compare
a board from an Amdahl machine to a board from an 8700.  The Amdahl uses
Fuji ECL with cooling towers 1-1/2 inches high (am I close on this dimension?).
There are probably other important differences.

>The point is that despite moderate-size misteakes, the 360 is a
>relatively OK design, and was fanaminally excellent for the early
>1960's!!!!!!!!

To me, the 360 (370 or whatever) and the VAX are equally OK for their
respective times, but definitely not OK for these times.

Please, let's not get hysterical, or, worse, mispellical.

guy%gorodish@Sun.COM (Guy Harris) (08/26/87)

> >seem to be doing quite well for themselves.
> Hoding, maybe....many "VAX-labs" are buying into non-DEC stuff
> (MIPS, Sun, Pyramid, Sequent, et.al.)

Yes, but note that a lot of VAX customers are not "VAX labs"; I believe VAXes
are going about 50% to commercial and 50% to technical applications.

The question is "is there any market in which the VAX is still increasing its
market share?"  If the answer is "yes", then in that market they are neither
"losing ground" nor "holding".

I think, obviously, that it would be Truly Wonderful if all those VAXes out
there could be replaced by Suns, but I'm not going to assume that VAXes and VMS
aren't still strong competitors and aren't going to remain so, at least in the
short term.

> The 8600 overlaps operand-decode with operand-fetch, and uses
> multiple functional (execution) units, but **UNLIKE** IBM and any
> other true pipe-line design, can *NOT* have multiple instructions
> in the decode phase simultaniously!

I.e., a machine that has, say, a fetch/decode unit and an execute unit, with
the fetch/decode unit being one instruction ahead of the execute unit, is not
pipelined?  In other words, your definition of "pipelined" is "has multiple
instructions in the decode phase simultaneously"?  Is this the only definition
of "pipelined" commonly used?

You may not be able to get *as much* parallelism from a pipelined
implementation of the VAX architecture as you can out of a pipelined
implementation of the 370 architecture, but this is very different from "you
can't pipeline VAXes".

> >So?  By itself, that may merely indicate that DEC hasn't pushed raw hardware
> >technology as hard as IBM has.
> While the 370 is not nice, at least with it, the first
> operand-byte **ALLWAYS** tells you where the next instruction
> starts!  The vax could be *MUCH* faster if it did this too, but
> then it would lose its code-density!

OK, so if you implement a VAX using the same technology as a top-of-the line
IBM mainframe, how fast would it be?  I can well believe it would not be as
fast as an equivalent 370, but would it top out at 8 MIPS or would it be, say,
more like 12 MIPS?  The fact that the top of the line VAX is only 8 MIPS, while
the top of the line 370 is 20 MIPS, does not *in and of itself* indicate that
this is due solely to architectural problems with the VAX.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

tim@amdcad.AMD.COM (Tim Olson) (08/26/87)

In article <422@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes:
+-----
|360:  designed for easy of implementation  (An idea common to RISC)
|VAX:  designed for maximum code density    (Poor choise today!)
|RISCs:designed for speed, thru simplicity
+-----
If the VAX instruction-set was designed for "maximum code density", they
certainly did a poor job.  Many processors (including some "RISCs" --
IBM ROMP and CRISP) can routinely beat it in code density.

	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

lamaster@pioneer.arpa (Hugh LaMaster) (08/26/87)

In article <26444@sun.uucp> guy%gorodish@Sun.COM (Guy Harris) writes:

(someone else writes:)

>> The 360 designers saw fit (or did they just guet luckie...I don't
>> think so!) to design *PIPELINING-CAPABILITY* into the 360.

>
>I dunno about that; when did the first pipelined machines come out?

Well, the 360/91 was pipelined, and was a very competitive machine to the CDC
6600; the 360/370/195 was pipelined, and was a very competitive machine to the
CDC 7600 (mid 60's to early 70's for these landmarks).  If the trade press is
to be believed, the reason IBM got out of the big machine market during the
early seventies was because of the famous CDC/IBM lawsuit and the Justice
Dept. antitrust suit, and because it was seen as politic to leave some niche
markets untouched, not because IBM couldn't build faster machines.  IBM has
gotten back in in a modest way with the 3090/VF machines.

The 360/370 architecture is amenable to a modest amount of pipelining, and it
is certainly easier to implement pipelined versions than the VAX architecture,
for all the reasons mentioned by previous posters: simple instruction decode,
simple addressing modes (but not a load/store machine, which complicates some
things) which are known at decode time, register usage known upon decode,
etc., etc.   The 360 architecture is as much a RISC machine as several widely
marketed "RISC" machines, though still a CISC machine by comparison with MIPS,
say.  Had it been a load/store machine like CDC, it would have been easier to
pipeline.  By comparison, the VAX is much more difficult.

Face it: DEC tried for a long time before producing the 8600, falling further
and further behind the general marketplace in price/performance and
performance.  A lot of companies like MIPS, Sun, Sequent, etc, might not exist
if DEC hadn't.

>
>> reason the ancint 360/370 stuff is still alive, while DEC's vaxen
>> are falling by the wayside (despite DES's best efforts)
>
>VAXes falling by the wayside?  I dunno about that, either.  DEC, VAXes, and VMS
>seem to be doing quite well for themselves.

I think the person meant in performance, which is certainly true.  Not in
marketing :-)

>
>> is that 360's *CAN* be pipelined (tho not necessarily real easily) and
>> VAXen can't!  
>
>VAXes can't be pipelined?  Gee, I suspect some of the 8600's designers would be
>surprised to hear that.
>

Only very limited pipelining is possible with the VAX architecture.  It is
just about the worst in that respect of any major computer architecture.  In
fact, it seems to have been a reaction to the VAX which brought about the
revival of ultra-pure RISC machines (RISC concepts had been in use all along
on some other machines, e.g. CDC).  As was previously noted, a good compiler
can produce rather dense code for the VAX.  The 64,000 MIPS question is: how
important is that?  Was it ever that important?  If memory utilization was
that important even in 1977, why didn't the paging hardware support direct
LRU?  DEC's success with the VAX was due to the 32 bit virtual memory
environment- a first for minicomputers at the time- not performance or
price/performance.

I would second the statements of many previous posters that the 360
architecture has proven to be very versatile, and has certainly been
implemented over a wider range of hardware complexity and performance than any
other architecture.  Too bad about that MVS stuff...

  Hugh LaMaster, m/s 233-9,  UUCP {seismo,topaz,lll-crg,ucbvax}!
  NASA Ames Research Center                ames!pioneer!lamaster
  Moffett Field, CA 94035    ARPA lamaster@ames-pioneer.arpa
  Phone:  (415)694-6117      ARPA lamaster@pioneer.arc.nasa.gov

                 "IBM will have it soon"

(Disclaimer: "All opinions solely the author's responsibility")

guy%gorodish@Sun.COM (Guy Harris) (08/26/87)

> OK, so if you implement a VAX using the same technology as a top-of-the line
> IBM mainframe, how fast would it be?

For some further amplification: from the February 1987 issue of the Digital
Technical Journal, on the VAX 8800 family, the cycle time of that family is
45ns.  A single-processor 8550 or 8700 is claimed to have a "sustained
applications throughput" of 6.0 times an 11/780.

From Mike Taylor's article on the Amdahl 5890s, the cycle time of that family
is 15ns.  A single-processor 5890-190E is claimed to have a MIPS rate 33 times
that of an 11/780 for a typical UTS workload.

I have no idea how the workloads DEC and Amdahl used compare.  Assuming they
are comparable, and assuming that the performance difference between the two is
solely due to 1) the Amdahl having a cycle time of 3x that of the VAX and 2)
the 370 architecture permitting you to build a faster box, the performance
increase due to the 370 architecture is about 1.8x.  Not shabby, but not 20/8 =
2.5x either.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

petolino%joe@Sun.COM (Joe Petolino) (08/27/87)

>The 8600 overlaps operand-decode with operand-fetch, and uses
>multiple functional (execution) units, but **UNLIKE** IBM and any
>other true pipe-line design, can *NOT* have multiple instructions
>in the decode phase simultaniously!  

This is certainly a novel criterion for calling a design 'pipelined'!
All of the CPU designs I know of (this includes machines by IBM, Amdahl,
MIPS, and Sun) have at most one instruction in each pipeline stage at any one
time.  This is almost by definition of the word pipeline - each instruction
flows from one stage to the next so that it can execute in parallel with
the instructions which are in the OTHER stages of the pipe.  Maybe the 
above poster is thinking of an instruction buffer which can hold several
already-fetched instructions waiting to go into the pipeline.  Maybe he's
thinking of some other form of parallelism altogether.

Anyway, so much for quibbling about names.  Here's a few cents worth of
my opinions on the 360 debate.

The 360 was certainly a landmark design for its time.  But times have
changed, and the 360 hasn't much (except maybe for the worse).  There's a
very good reason for this - a huge amount of non-portable software which runs
only on that architecture.  This is the reason that the 360/370 is still with
us: enough captive customers with enough money to make new implementations
profitable.  I don't think it's any inherent superiority of the architecture
that accounts for the high performance of the current top-of-the-line
incarnations - it's just that no one else has enough dollars worth of
customer base to justify the huge design effort that one of these beasts
requires.

I spent seven years designing caches for 370-compatibles, so I can give some
memory-related reasons why this architecture is difficult to implement:

* The architecture does not acknowledge the existence of caches.  There are
  no restrictions on storing into instruction words, no restrictions on
  virtual address mappings, no separation of code and data pages.  All these
  things conspire to make cache consistency a true headache.

* The normal instruction format specifies an operand address as the sum
  of two registers plus an offset.  This requires that three things be
  added together in the critical operand cache addressing path.

* Operand fetches must work on any alignment.  In addition to requiring
  shift networks in the data paths (not a big deal), this requires
  that the hardware be able to concatenate bytes from two different
  cache lines into a single operand.  Either of these two cache accesses
  may miss the cache or get an exception.

* There is no concept of an Address Space Identifier.  Instead, most 
  implementations use the address of the root of the translation tables,
  plus some control bits, to identify the Virtual Space that a virtual
  address belongs to.  This makes for some very long Tag words in TLBs
  and/or caches.

* Memory protection based on 'keys' which are attached to physical, not
  virtual, pages.  Since most cache implementations are virtually-addressed,
  finding and updating cached copies of these keys requires some
  sophisticated states machines which search through all entries of all
  caches and/or TLBs in the system.  The architecture requires that this
  be done by hardware.

* Several different translation table formats.  Virtual-to-physical
  translations are done in hardware, and the data paths needed to accomodate
  umpteen different operating systems' table formats is really messy.
  The older of these formats translates a 24-bit VA to a 24-bit PA.
  In a stroke of genious a few years ago, some new formats were introduced
  which expanded this to 31 (not 32) bits.
  
These are just a few of the things that I remember as being particularly
ill-suited to high-performance implementations.  Many of these are
characteristics of the 370, not the 360.  The last item is just a
special case of my biggest complaint about the 370: it's just too damn
complicated!  What started out as a reasonably clean and coherent 
architecture has been distorted by decades of added 'features' intended to
patch up the mismatch of old concepts to new technologies.

One final word about the ASCII vs EBCDIC debate.  You can enter ANY of the
128 ASCII codes from a standard ASCII keyboard.  I don't know of any
EBCDIC keyboard that can make a similar claim.  Part of the reason might
be that there is no agreed-upon standard for the graphic representation
of each character - seems to be more a matter of what's on the 'print chain'
at the time.  And part of the reason might be "We don't want just ANYONE
to enter THAT code!"

-Joe

corbin@encore.UUCP (08/27/87)

In article <2595@ames.arpa> lamaster@ames.UUCP (Hugh LaMaster) writes:
>      DEC's success with the VAX was due to the 32 bit virtual memory
>environment- a first for minicomputers at the time- not performance or
>price/performance.

I believe that Prime Computer came out with the first minicomputer with
32-bit virtual memory addressing in 1975.  It was called the P400 and
consisted of 2 boards about 16" x 16", much smaller (logic wise) than the
Vax.
-- 

Stephen Corbin
{ihnp4, allegra, linus} ! encore ! corbin

tim@amdcad.AMD.COM (Tim Olson) (08/27/87)

In article <2596@ames.arpa>, lamaster@pioneer.arpa (Hugh LaMaster) writes:
+-----
| In article <18088@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes:
| 
| >If the VAX instruction-set was designed for "maximum code density", they
| >certainly did a poor job.  Many processors (including some "RISCs" --
| >IBM ROMP and CRISP) can routinely beat it in code density.
| 
| Well, it was designed for high code density at the time. Two points:
| 1)  Maybe people have learned something in 10 years, and
| 2)  Be careful to compare oranges and oranges.  I think the code density
| produced by the VMS compilers is quite good, and compares favorably with
| code produced on other 32 bit architectures with good compilers.  It is
| a different test to compare the code produced by pcc, for example.  What
| comparisons were you referring to?
+-----

Whoops -- I may have spoken too strongly, here.  I was going on past
experience with our IBM RT-PC (ROMP processor) and our VAX 11/780,
comparing optimized object code.  Since the RT has a much better
optimizer, this would be an "unfair" comparison.  I just ran
(unoptimized) pcc on our internal assembler source, I got the following
text sizes for the object code files:

Module		Vax 11/780	RT-PC	       % change
condasm o	1060		1384		30 
eval o		3988		4268		7 
float.o		1716		1996		16 
lex.o		6120		6972		13 
macros.o	5036		5480		8 
odump.o		2272		2440		7 
opcodes.o	208		280		34 
pass1.o		604		736		21 
pass2.o		4720		7048		49 
procstr.o	6132		6028		-1 
pseudo.o	6304		6884		9 
seg.o		3172		2152		-32 
sipasm.o	1528		1320		-13 
sym.o		1388		1496		7 


So, overall, the VAX was more compact, although the RT was within 10% in
many cases, and was more compact in some.


	-- Tim Olson
	Advanced Micro Devices
	(tim@amdcad.amd.com)

mash@mips.UUCP (08/27/87)

In article <18093@amdcad.AMD.COM> tim@amdcad.AMD.COM (Tim Olson) writes:
>In article <2596@ames.arpa>, lamaster@pioneer.arpa (Hugh LaMaster) writes:
>+-----
>| In article <18088@amdcad.AMD.COM> tim@amdcad.UUCP (Tim Olson) writes:
>| 
>| >If the VAX instruction-set was designed for "maximum code density", they
>| >certainly did a poor job.  Many processors (including some "RISCs" --
>| >IBM ROMP and CRISP) can routinely beat it in code density.
>Whoops -- I may have spoken too strongly, here.

Note also that ROMP is a 16-register, multiple-instruction-size
architecture, whose design goals (low cost) forced them away from
caches, and thus towards a denser coding, and also towards lower
cost at the sacrifice of some speed.  The 32-bit-instruction RISCs
[IBM 801, HP Precision, MIPS, SPARC, etc] are usually less dense
than a VAX.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

bcase@apple.UUCP (Brian Case) (08/27/87)

In article <18093@amdcad.AMD.COM> tim@amdcad.AMD.COM (Tim Olson) writes:
>In article <2596@ames.arpa>, lamaster@pioneer.arpa (Hugh LaMaster) writes:
>So, overall, the VAX was more compact, although the RT was within 10% in
>many cases, and was more compact in some.
>  [Plus other stuff about code density]

It must also be realized that *optimized* does not necessarily mean "less
code."  At certain points, it is possible to optimize for size at the
expense of speed, and vice-versa.  When choosing instruction encodings, this
is almost always the case (space/time trade-off).  The VAX simply *allows*
high code density; that is not to say that the "optimial" VAX encoding of
a given program is also the densest VAX encoding of the program.  However,
it is my belief that RISCy instruction sets allow the optimizer to operate,
most of the time, under the assumption that "if I can remove this instruction,
I'll be saving both time and space, and I know *exactly* how much space and
probably how much time."  Unfortunately, this assumption doesn't hold very
well when you start talking about scheduling (load/store, delayed branches,
etc.).

    bcase

bcase@apple.UUCP (Brian Case) (08/27/87)

In article <2595@ames.arpa> lamaster@ames.UUCP (Hugh LaMaster) writes:
>Only very limited pipelining is possible with the VAX architecture.  It is
>just about the worst in that respect of any major computer architecture.  In

It is bad, but it is possible to brute-force pattern-match the most common
instructions; this is expensive in hardware, but it can be done; I don't know
if this is being done in the 86/87 series machines.  I suspect Motorola will
have to resort to the same techniques DEC is using for faster 680x0 machines.
Depending upon the how much you are willing to pay, you get an implementation
where some things run at one-per-cycle, while other things take their usual,
sweet time.

>fact, it seems to have been a reaction to the VAX which brought about the
>revival of ultra-pure RISC machines (RISC concepts had been in use all along
>on some other machines, e.g. CDC).  As was previously noted, a good compiler

Ultra-pure RISC machines were originally motivated by a brillian recognition
of the synergy between compiler considerations and hardware considerations.
At least this is my impression from reading the literature.

>can produce rather dense code for the VAX.  The 64,000 MIPS question is: how
>important is that?  Was it ever that important?  If memory utilization was
>that important even in 1977, why didn't the paging hardware support direct

Well, just read one of Wirth's most recent papers; at least at the time of
that writing, he seemed to think it was the *only* metric!

>LRU?  DEC's success with the VAX was due to the 32 bit virtual memory
>environment- a first for minicomputers at the time- not performance or
>price/performance.
>
>I would second the statements of many previous posters that the 360
>architecture has proven to be very versatile, and has certainly been
>implemented over a wider range of hardware complexity and performance than any
>other architecture.  Too bad about that MVS stuff...

True, true, true, but, at least from a purist point of view, and this is a
news group for computer architecture, not admirable marketing/engineering
trade-offs, it leaves much to be desired.  Throw enough resources at an
architecture (within reason), and you will get a fast implementation.  This
does not mean that the 360/370 is any more versatile than other stuff!  My
comments do not, however, diminish the well-taken point of the earlier-made
statement "May your favorite architecture be as successful in 25 years as
the 370 is now."

>			"IBM will have it soon"

Unless by "have" you mean "own," don't count on it.  Or perhaps, "No, they
already have it, but it'll never see the light of day."

weaver@prls.UUCP (Michael Gordon Weaver) (08/27/87)

In article <2595@ames.arpa> lamaster@ames.UUCP (Hugh LaMaster) writes:
>....  As was previously noted, a good compiler
>can produce rather dense code for the VAX.  The 64,000 MIPS question is: how
>important is that?  Was it ever that important?  If memory utilization was
>that important even in 1977, why didn't the paging hardware support direct
>LRU?  DEC's success with the VAX was due to the 32 bit virtual memory
>environment- a first for minicomputers at the time- not performance or
>price/performance.
>

I believe that the reason that code density was considered so important in
the design of the VAX has to do with instruction speed, not memory 
requirements (disk storage size is a third possible reason). At that time 
(from what I have read) there appeared to be some consensus that the 
bottleneck for (non-floating point) instruction throughput was fetching 
instructions. Today, many people think instruction decoding is a major
bottleneck.

The VAX 11/780 performs instruction fetch and decode in parallel to 
arithmetic operations, so in this sense it is similar to RISC-II or
MIPS. It has a cache which is probably fast enough not to limit
instruction speed, but has miss rate is higher than what we expect today
due to its size. The absolute size of code does not vary much (~30%) 
from one 32 bit machine to another, but the typical instructions cache
for a given MIPS rate is perhaps eight times as large today as in 1977.
So the frequency of waiting for instructions to be fetched from main
memory should be decrease significantly. I do not have any numbers to
make this plausible, but my feeling is that this is the best reason
for the emphasis on code density.

-- 
Michael Gordon Weaver                   Usenet: ...pyramid!prls!weaver
Signetics Microprocessor Division
811 East Arques Avenue
Sunnyvale, California USA 94088-3409            Phone: (408) 991-3450

tihor@acf4.UUCP (Stephen Tihor) (08/27/87)

The major problem with EBCDIC is that there is (1) no single definition
of what all the codes are, (2) as a result no standardizable mapping
between EBCDIC and 8bit ASCII.  The other :problems: just reflect punch 
cards versus bit flippers.

henry@utzoo.UUCP (Henry Spencer) (08/30/87)

> I was under the impression that most 370s did mundane payroll/bookkeeping
> stuff.  For those tasks, bcd string performance would seem to be paramount.

My recollection is that when you actually measure them, even COBOL programs
spend most of their time doing ordinary (non-bcd) instructions for overhead,
i/o control, addressing, etc.
-- 
"There's a lot more to do in space   |  Henry Spencer @ U of Toronto Zoology
than sending people to Mars." --Bova | {allegra,ihnp4,decvax,utai}!utzoo!henry

mash@mips.UUCP (08/30/87)

In article <8519@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> I was under the impression that most 370s did mundane payroll/bookkeeping
>> stuff.  For those tasks, bcd string performance would seem to be paramount.
>
>My recollection is that when you actually measure them, even COBOL programs
>spend most of their time doing ordinary (non-bcd) instructions for overhead,
>i/o control, addressing, etc.

More specifically:
a) In some of the work that lead to the 801, IBM found that the decimal
operations were not really used very much over large mixes.
b) HP found the same thing: they did include a few simple operations to help
decimal arithmetic, but that's it.  In particular, they did
extensive studies and found that their COBOL programs (very relevant for
some of their markets) spent a lot of time in the OS, record managers, etc.
c) On the other hand, DEC (Clark & Levy, "Measurement and Analysis of
Instrucion Use in the VAX 11/780", ACM SIGARCH, April 1982), finds that
specific COBOL programs use the decimal operations heavily. This may or
may not contradict what HP found, in that a given COBOL program may well
use the operations, but the overall use of a runnign system may not do
so as heavily.
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	{decvax,ucbvax,ihnp4}!decwrl!mips!mash  OR  mash@mips.com
DDD:  	408-991-0253 or 408-720-1700, x253
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

ken@argus.UUCP (Kenneth Ng) (08/31/87)

In article <26623@sun.uucp>, petolino%joe@Sun.COM (Joe Petolino) writes:
> * The architecture does not acknowledge the existence of caches.  There are
>   no restrictions on storing into instruction words, no restrictions on
>   virtual address mappings, no separation of code and data pages.  All these
>   things conspire to make cache consistency a true headache.

Look up Discontigious Shared Segments under VM.  This forces program
code into a read only status.  It also permits several machines to share
the same copy of programs, providing a  major boost in performance.

> One final word about the ASCII vs EBCDIC debate.  You can enter ANY of the
> 128 ASCII codes from a standard ASCII keyboard. 

Provided you don't have a terminal server or something doing funny things
to certain characters like control-q, control-s, delete, control-c,
control-d, control-x, control-p, and null.

Kenneth Ng: Post office: NJIT - CCCC, Newark New Jersey  07102
uucp !ihnp4!allegra!bellcore!argus!ken *** NOT ken@bellcore.uucp ***
bitnet(prefered) ken@orion.bitnet