[comp.arch] Can old architectures run fast?

johnl@iecc.cambridge.ma.us (John R. Levine) (05/06/91)

In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
>Are we likely to see the fastest CPU in year X being able to run,
>without change, a binary program more than 5 years old? ...

Well, there's always the IBM 360.  You can still run 1965 vintage 360
binaries on IBM's latest 3090 mainframe.  The 360 architecture has stood the
test of time surprisingly well, better I think than the 360 extensions.  The
360 had simple instruction decoding, strict data alignment rules, and a
large and uniform register set which made it relatively easy to speed up.
(Yes, it also had things like edit-and-mark which is a disaster in a paging
system, they weren't totally prescient.)

-- 
John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 492 3869
johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
Cheap oil is an oxymoron.

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (05/06/91)

In article <1991May05.174756.9026@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:

| Well, there's always the IBM 360.  You can still run 1965 vintage 360
| binaries on IBM's latest 3090 mainframe.  

  And Honeywell. Things compiled in GECOS-II in 1963 or so seem to run
on the latest Honeywell DPS systems as well.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

haynes@felix.ucsc.edu (99700000) (05/07/91)

In article <1991May05.174756.9026@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:

| Well, there's always the IBM 360.  You can still run 1965 vintage 360
| binaries on IBM's latest 3090 mainframe.  

Hmmm. Wonder if you can still emulate 1401 machine code on a 3090?

dmocsny@minerva.che.uc.edu (Daniel Mocsny) (05/07/91)

In article <1991May05.174756.9026@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
>In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
>>Are we likely to see the fastest CPU in year X being able to run,
>>without change, a binary program more than 5 years old? ...
>Well, there's always the IBM 360.  You can still run 1965 vintage 360
>binaries on IBM's latest 3090 mainframe.

That is truly impressive, in fact, it's rather astounding. But I see 
I left cost out of my question. So let me try another wrinkle:

How does a 3090 stack up against modern workstations on the usual
measures of performance/price, such as SPECmarks/$? My guess would
be that the large backwards compatibility comes at a price.

Also, how much slower and/or more expensive is the 3090 as a result
of maintaining such backwards compatibility? (I realize that might be
hard to get a handle on.)

--
Dan Mocsny				
Internet: dmocsny@minerva.che.uc.edu

johnl@iecc.cambridge.ma.us (John R. Levine) (05/07/91)

In article <8346@uceng.UC.EDU> you write:
>How does a 3090 stack up against modern workstations on the usual
>measures of performance/price, such as SPECmarks/$? My guess would
>be that the large backwards compatibility comes at a price.

It's hard to compare, since the 3090 is a mainframe, not a workstation,
which means that it has I/O bandwidth orders of magnitude better than
anything you'd see on or next to a desk.  High end 3090 installations run
on-line systems which handle 1000 transactions/second (that's per second,
not per hour) and there's nothing anywhere near comparable in
workstation-land.  A bunch of micros sharing data over a network turns out
not to do the trick, because you end up with intolerable hot spots in the
data base.

>Also, how much slower and/or more expensive is the 3090 as a result
>of maintaining such backwards compatibility? (I realize that might be
>hard to get a handle on.)

No question. it's not cheap.  Some of the stuff they have to do is extremely
gross.  The worst example is an execute instruction which points to a
translate-and-test (TRT) instruction.  The TRT has two memory operands and
looks up each byte of the first using the second as the lookup table until
it finds a table entry that's non-zero.  This means that the length of the
first operand depends on its contents.  360 instructions are not
continuable, so since the execute, the TRT, and both operands can each
potentially span a page boundary, the CPU can need to touch as many as 8
pages.  To tell whether it needs the 8th page it does a "trial execution" of
the instruction that doesn't store any results before actually doing the
instruction.  There's even more internal hair than that, since the 3090 has
lots of fault-tolerance hardware and takes microcode checkpoints several
places in a complex instruction.

That's the worst, a more typical instruction is "add" which computes an
address by adding together one or two registers and a 12-bit offset in the
instruction, picking up the word at that address, and adding it to a target
register.  Other than the three-input adder for address generation, that's
pretty straightforward.

greg@organia.sce.carleton.ca (Greg Franks) (05/07/91)

In article <8346@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:

   In article <1991May05.174756.9026@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
   >In article <8324@uceng.UC.EDU> dmocsny@minerva.che.uc.edu (Daniel Mocsny) writes:
   >>Are we likely to see the fastest CPU in year X being able to run,
   >>without change, a binary program more than 5 years old? ...
   >Well, there's always the IBM 360.  You can still run 1965 vintage 360
   >binaries on IBM's latest 3090 mainframe.

   That is truly impressive, in fact, it's rather astounding. But I see 
   I left cost out of my question. So let me try another wrinkle:

   How does a 3090 stack up against modern workstations on the usual
   measures of performance/price, such as SPECmarks/$? My guess would
   be that the large backwards compatibility comes at a price.

   Also, how much slower and/or more expensive is the 3090 as a result
   of maintaining such backwards compatibility? (I realize that might be
   hard to get a handle on.)

However, people who purchase IBM 3090's are not interested in
SPECmarks/$$ because they are more interested in MBytes/second of
transfer from the disk farm to the CPU and back.  Most workstations of
the SPECmarks/$$ fall flat on their face when it comes to I/O systems.

(They also want to run five levels of emulation so that their anchient
accounting program doesn't have to be changed :-)
--
Greg Franks, (613) 788-5726               | "The reason that God was able to    
Systems Engineering, Carleton University, | create the world in seven days is   
Ottawa, Ontario, Canada  K1S 5B6.         | that he didn't have to worry about 
greg@sce.carleton.ca  ...!cunews!sce!greg | the installed base" -- Enzo Torresi

ward@vlsi.waterloo.edu (Paul Ward) (05/07/91)

In article <8346@uceng.UC.EDU> dmocsny@minerva.che.uc.edu 
           (Daniel Mocsny) writes:

>That is truly impressive, in fact, it's rather astounding. But I see 
>I left cost out of my question. So let me try another wrinkle:
>
>How does a 3090 stack up against modern workstations on the usual
>measures of performance/price, such as SPECmarks/$? My guess would
>be that the large backwards compatibility comes at a price.

A meaningless question - how can you possibly compare price performance of
a workstation (typically a single or a few user machine) with an IBM mainframe
which can support 400+ users concurrently?  If anything, the major difference
between PCs, workstations, minis and mainframes is the IO bandwidth, not
the processor performance.  What good is 500 MIPS and 50 MFLOPS if you are
waiting so long for an I/O operation to complete that the real performance is
~5 MIPS and 0.5 MFLOPS.  (The same applies to the memory subsystem - you have
to keep the processor fed, or it will stall).

>Also, how much slower and/or more expensive is the 3090 as a result
>of maintaining such backwards compatibility? (I realize that might be
>hard to get a handle on.)
>
>
>
>--
>Dan Mocsny				
>Internet: dmocsny@minerva.che.uc.edu

Paul Ward.
-- 
"One can certainly imagine the myriad of uses for a hand-held iguana maker."
								-  Hobbes.

sysmgr@KING.ENG.UMD.EDU (Doug Mohney) (05/07/91)

In article <1991May7.130302.22332@vlsi.waterloo.edu>, ward@vlsi.waterloo.edu (Paul Ward) writes:
>
>A meaningless question - how can you possibly compare price performance of
>a workstation (typically a single or a few user machine) with an IBM mainframe
>which can support 400+ users concurrently?  If anything, the major difference
>between PCs, workstations, minis and mainframes is the IO bandwidth, not
>the processor performance. 

And the service contracts :-)


     Signature envy: quality of some people to put 24+ lines in their .sigs
  -- >                  SYSMGR@CADLAB.ENG.UMD.EDU                        < --

c506634@umcvmb.missouri.edu (Eric Edwards) (05/09/91)

In article <1991May7.130302.22332@vlsi.waterloo.edu> ward@vlsi.waterloo.edu (Paul Ward) writes:
>  
> A meaningless question - how can you possibly compare price performance of
> a workstation (typically a single or a few user machine) with an IBM mainframe
> which can support 400+ users concurrently?  If anything, the major difference
> between PCs, workstations, minis and mainframes is the IO bandwidth, not
> the processor performance.  What good is 500 MIPS and 50 MFLOPS if you are
> waiting so long for an I/O operation to complete that the real performance is
> ~5 MIPS and 0.5 MFLOPS.  (The same applies to the memory subsystem - you have

It's not entirely meaningless.  Many jobs are CPU, not I/O bound. 
Compatibility aside, you would have to be a fool to use an IBM mainframe
for that.   They don't even come close to competitive with workstations on
CPU performance.  Can *all* the price performance difference be attributed
to the presence or absence of a high speed IO system?  

Also, is there anything to prohibit a RISC based machine from having a high
speed IO subsystem?  Would adding this make the machine cost as much as a
3090?

Eric Edwards:  c506634 @  "I say we take off and nuke the entire site
Inet: umcvmb.missouri.edu  from orbit.  It's the only way to be sure."
Bitnet: umcvmb.bitnet      -- Sigourney Weaver, _Aliens_

ward@vlsi.waterloo.edu (Paul Ward) (05/09/91)

In article <c506634.3284@umcvmb.missouri.edu> 
            c506634@umcvmb.missouri.edu (Eric Edwards) writes:
>In article <1991May7.130302.22332@vlsi.waterloo.edu> 
            ward@vlsi.waterloo.edu (Paul Ward) writes:
>>  
>> A meaningless question - how can you possibly compare price performance of
>> a workstation (typically a single or a few user machine) with an IBM 
>> mainframe which can support 400+ users concurrently?  If anything, the 
>> major difference between PCs, workstations, minis and mainframes is the 
>> I/O bandwidth, not the processor performance.  What good is 500 MIPS 
>> and 50 MFLOPS if you are waiting so long for an I/O operation to complete 
>> that the real performance is~5 MIPS and 0.5 MFLOPS.  (The same applies 
>> to the memory subsystem - you have
> 
>It's not entirely meaningless.  Many jobs are CPU, not I/O bound. 
>Compatibility aside, you would have to be a fool to use an IBM mainframe
>for that.   They don't even come close to competitive with workstations on
>CPU performance.  Can *all* the price performance difference be attributed
>to the presence or absence of a high speed IO system?  

I agree that many jobs are CPU bound - but take a closer look at them.  Take
simulation as an example - suppose you want to simulate 10,000,000 logic
gates in some design.  (BTW there is nothing on the market that can do this
at the moment).  It looks like a classic CPU bound problem.  However, it is
so large, that no workstation memory system can handle it.  You need virtual
memory.  But again, it is so large that you will spend forever just swapping
pages between disk and memory.  What is required is a very large (~100s MB) of
memory, and a very fast disk I/O subsystem.

>Also, is there anything to prohibit a RISC based machine from having a high
>speed IO subsystem?  Would adding this make the machine cost as much as a
>3090?

I don't know, but it is an interesting question.  Do you have $20,000,000 ?
We can try a little experiment.  :-)

Paul Ward
University of Waterloo

-- 
"One can certainly imagine the myriad of uses for a hand-held iguana maker."
								-  Hobbes.

johnl@iecc.cambridge.ma.us (John R. Levine) (05/09/91)

In article <c506634.3284@umcvmb.missouri.edu> you write:
>Can *all* the price performance difference be attributed
>to the presence or absence of a high speed IO system?  

No, a lot of it is due to high-performance shared memory and a lot of
reliability and servicability hardware and microcode.  The 370/ESA I/O
system does some impressive stuff in the interest of performance.  For
example, each disk drive is typically attached to several controllers, each
of which is attached to several channels.  This means that there are
typically four or more different physical device addresses for each disk.
One of the improvements in the new I/O system is that the CPU just issues an
operation for the logical disk, and the channels find a path that isn't
already in use.  There is also a lot of buffering in disk controllers, as
much as 128MB (that's MB, not KB.)

In fairness, there is also a lot of glop, particularly in the disks, to
support designs that made a lot more sense in 1964 than they do now.
Traditional IBM disks allow variable length hardware disk blocks, and each
block can have a key of up to 256 bytes.  You can have the disk controller
search down a track or cylinder looking for a particular key.  This made
perfect sense for ISAM on a 360/30, when the CPU stopped during disk I/O
anyway, but it's pretty awful now.  IBM has for 20 years had more reasonable
index schemes based on B-trees, and disks with fixed size blocks addressed
by block number rather than cylinder, track, and record, but there is still
support for the old stuff.  One might reasonably expect a new design not to
have hardware keys on the disk.

>Also, is there anything to prohibit a RISC based machine from having a high
>speed IO subsystem?

In most cases, no.  In some cases there might be problems with bus
contention, cache collisions, etc.  Almost all 3090 systems have more than
one CPU, and they all have many channels, which affects the design quite a
lot.  By the way, the 3090 channels each contain an 801 RISC micro to
control the I/O, so in that sense there is already a RISC with a fast I/O
system.

>  Would adding this make the machine cost as much as a 3090?

I expect it'd be close enough that the price difference wouldn't be
compelling.

-- 
Regards,
John Levine, johnl@iecc.cambridge.ma.us, {spdcc|ima|world}!iecc!johnl

cet1@cl.cam.ac.uk (C.E. Thompson) (05/10/91)

In article <1991May05.174756.9026@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
> ... w.r.t. the IBM 360 architecture ...
>(Yes, it also had things like edit-and-mark which is a disaster in a paging
>system, they weren't totally prescient.)
>
EDMK was an implementation problem on early IBM 370s with virtual memory,
because the length of the area to be modified could only be determined by
trial execution, while address translation exceptions were required to nullify
all side-effects of the instruction. But this isn't a problem in modern
implementations (e.g. IBM 308x and 3090) which have general mechanisms for
rolling back the logical state of of a CPU, including recent storage
modifications.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

cet1@cl.cam.ac.uk (C.E. Thompson) (05/10/91)

In article <9105070005.AA24446@iecc.cambridge.ma.us> johnl@iecc.cambridge.ma.us (John R. Levine) writes:
> {in re IBM 360 architecture}
>
>No question. it's not cheap.  Some of the stuff they have to do is extremely
>gross.  The worst example is an execute instruction which points to a
>translate-and-test (TRT) instruction.  The TRT has two memory operands and
>looks up each byte of the first using the second as the lookup table until
>it finds a table entry that's non-zero.  This means that the length of the
>first operand depends on its contents.  360 instructions are not
>continuable, so since the execute, the TRT, and both operands can each
>potentially span a page boundary, the CPU can need to touch as many as 8
>pages.  To tell whether it needs the 8th page it does a "trial execution" of
>the instruction that doesn't store any results before actually doing the
>instruction.  

There is something seriously wrong with this example. TRT doesn't *modify*
storage, so rolling back the state of the CPU on a paging exception is 
almost trivial. You can't predict how early the TRT will stop, and so which
pages will be touched, but you don't *need* to. It is no worse in this   
respect than a CLC instruction.

A straight TR instruction is actually somewhat worse, because it does modify
storage, and one can't tell without trial execution whether the whole of 
the 256-byte translation table needs to be translatable. The notoriously
awful instruction is EDMK, as pointed out in another posting. Anyway, all
these problems are finessed by the general rollback mechanisms of IBM 308x  
and 3090 series machines.

>              There's even more internal hair than that, since the 3090 has
>lots of fault-tolerance hardware and takes microcode checkpoints several
>places in a complex instruction.

Even with a non-interruptible instruction? (obviously this happens for   
interruptible instructions like MVCL and CLCL.) Do you know this for a fact
(about 3090s, specifically)? It rather suprises me.

Chris Thompson
JANET:    cet1@uk.ac.cam.phx
Internet: cet1%phx.cam.ac.uk@nsfnet-relay.ac.uk

herrickd@iccgcc.decnet.ab.com (05/14/91)

In article <9105091145.AA04421@iecc.cambridge.ma.us>, johnl@iecc.cambridge.ma.us (John R. Levine) writes:
> Traditional IBM disks allow variable length hardware disk blocks, and each
> block can have a key of up to 256 bytes.  You can have the disk controller
> search down a track or cylinder looking for a particular key.  This made
> perfect sense for ISAM on a 360/30, when the CPU stopped during disk I/O
> anyway, but it's pretty awful now.  IBM has for 20 years had more reasonable

a minor nit:  My 30 had a DASD controller that took channel programs
from the memory of the 30 and went off and did them.  If the 30 wanted
to wait, it could, but nothing compelled it to.

Having the DASD controller do the key lookup while the general purpose
computer goes on about its business makes good sense.  Finding the 
correct record on a track is a mechanical (as opposed to electronic)
process.  The 30 could continue doing things at electronic speeds
while the DASD controller watched the disk rotate.

DASD - Direct Access Storage Device.  We had three 2311 drives on that
system.  They looked like small washing machines with removable disk
packs under transparent covers.  The packs had ten recording surfaces
on six discs.  I believe the capacity was about two and a half million
bytes per pack.

dan herrick
herrickd@iccgcc.decnet.ab.com

haw30@duts.ccc.amdahl.com (Henry A Worth) (05/15/91)

In article <1991May9.144406.20558@vlsi.waterloo.edu> ward@vlsi.waterloo.edu (Paul Ward) writes:
>In article <c506634.3284@umcvmb.missouri.edu> 
>            c506634@umcvmb.missouri.edu (Eric Edwards) writes:
>
>>Also, is there anything to prohibit a RISC based machine from having a high
>>speed IO subsystem?  Would adding this make the machine cost as much as a
>>3090?

    The HIPPI? interface with RAID (Redundant Arrays of Inexpensive Disks)
disk systems is one possible solution.

>
>I don't know, but it is an interesting question.  Do you have $20,000,000 ?
>We can try a little experiment.  :-)
>

    So, you want to build a mainframe-class RISC system. Well, to compete
with recently announced high-end CISC products from IBM, Amdahl, et. al., 
your going to need at least:

    4-8 CPU's with 50+ honest MIPS - none of those inflated marketing
    RISC mips.

    ~1GB of fast static RAM - can't let those CPU's wait on paging,
    or have to twiddle their thumbs for too long after cache hits.

    A couple of hundred SCSI controllers - for that 10 million dollar 
    disk farm in your garage.

    The capability for a like number of FDDI controllers - to keep up
    with the latest net-news.

    Several additional processors to help manage the I/O, encrypion/
    decryption, ...

    Perhaps a few GB of dynamic RAM for disk caching, SSD and such.

    The busses and logic to make all this work...

    Features to ensure high availability...

    Oh, and don't forget a 100kw power source and cooling system.
         

   Now, if your a bit shy of the 100's of millions of dollars it would 
take to develop such a system -- several vendors (including Amdahl)
have announced that they are working on "RISC-based" mainframes (I believe
a few low-end systems have already been announced or are even avaliable from a 
couple of the Japanese mainframe producers) -- just wait until they become 
commodity items, and buy off the shelf. :-)


--
Henry Worth  --  haw30@duts.ccc.amdahl.com
No, I don't speak for Amdahl -- I'm not even sure I speak for myself.