johnw@astroatc.UUCP (John F. Wardale) (09/03/87)
I while ago I made the following claims about pipelining in regards to 360's and VAXen, which generated some warm replies. I should have answered them sooner, but the best laid plans of mice and men.... >From: petolino%joe@Sun.COM (Joe Petolino) me> The 8600 overlaps operand-decode with operand-fetch, and uses me> multiple functional (execution) units, but **UNLIKE** IBM and any me> other true pipe-line design, can *NOT* have multiple instructions me> in the decode phase simultaniously! > > This is certainly a novel criterion for calling a design 'pipelined'! > All of the CPU designs I know of (this includes machines by IBM, Amdahl, > MIPS, and Sun) have at most one instruction in each pipeline stage at any one Very true, but a phase is not a state [see below] > second-guessing deleted...also see below >From: bcase@apple.UUCP (Brian Case) > In article .... [I wrote] me> THE *MAJOR* me> reason the ancint 360/370 stuff is still alive, while DEC's vaxen me> are falling by the wayside (despite DEC's best efforts) is that me> 360's *CAN* be pipelined (tho not necessarily real easily) and me> VAXen can't! > > I beg your pardon, but your statement is quite a bit stronger than reality > will permit. I, for one, believe that the high-end VAXs are quite pipelined. Mmmm...Not really. me> The 1st byte of each 370 instruction tells the length of the instruction! > > You have pin-pointed one of the VAX's problems. This does not prevent, > absolutely, pipelining. Ok, but it ties the designers hands and one foot behind his back!!! ---------------------------- First I will admit that my phrasing WAS not great...(That's why I'm trying to clearity... By "decode phase" I'm refering to anything *BEFORE* the instruction is issued. By *MY* definition (which could be totally off the wall) a machine that can only work on cracking (opcode-decode, operand-decode(VM?), operand-fetch, hazard-checking, etc.) is NOT pipelined. [if you declare one "cracking" and two being "executed" as "pipelined" that's OK, but I'm talking about *REAL* assmbly-line pipelining ] ---------------------------- >From: guy%gorodish@Sun.COM (Guy Harris) asks: > OK, so if you implement a VAX using the same technology as a top-of-the line > IBM mainframe, how fast would it be? > [ top of the line speeds ] do not *in and of itself* indicate that > this is due solely to architectural problems with the VAX. Theoreticly true, but in this case, I think it is. I believe the encoding of VAX instructions prevents one from making it go fast, while still being affordable. (Its a point-of-diminishing returns question.) Comments...Anyone think it'd be (economically) worth building a VAX 3X or 10X the current top-vax? Or has it [as I feel it has] reached the limit for current technology. Have 360-type machine speeds been improving with, or faster than technology-speeds? (sorry if that's too vague) What is the rate or rates of top-360 speed increase, and how does this track technology speeds?? s | e | ==/ e | ==/ ==/ d | ==/ ==/ upper line is tecnology limit | =/ lower line is 360 limit | / | / question: is the 360 in the /, the =/ or | / the ==/ part of its line. (i.e. | / I don't know where dates are on ------------------- the time line.) time crt grafix suck....lets all buy Mac-II's (semi :-) -- John Wardale ... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw To err is human, to really foul up world news requires the net!
johnw@astroatc.UUCP (John F. Wardale) (09/03/87)
Correction...The vertical axis is "speed" not "seed" ... Sorry -- John Wardale ... {seismo | harvard | ihnp4} ! {uwvax | cs.wisc.edu} ! astroatc!johnw To err is human, to really foul up world news requires the net!
rw@beatnix.UUCP (Russell Williams) (09/03/87)
In article <430@astroatc.UUCP> johnw@astroatc.UUCP (John F. Wardale) writes: >I believe the encoding of VAX instructions prevents one from >making it go fast, while still being affordable. (Its a >point-of-diminishing returns question.) Comments...Anyone think it'd be >(economically) worth building a VAX 3X or 10X the current top-vax? >Or has it [as I feel it has] reached the limit for current technology. DEC doesn't think so -- they're building one, or so I hear. What the cost will be is open to question, but I understand their margins on existing machines are quite high and going up all the time. Maybe it won't be so high on the next one, but when you sell that many it doesn't matter as much. People in DEC's or IBM's position don't have to have the most cost-effective architecture by a long shot. Once you're firmly established, the disadvantages of your architecture become less important because you have the advantages of: 1. More R&D$ to optimize your implementations -- how many companies can afford to invent TCMs? 2. Volume manufacturing -- you can buy in big quantities or mfr. subsystems and/or chips in-house, and you can buy expensive mfg. equipment to drive down costs. 3. The "nobody ever got fired for buying IBM (DEC) effect". In most market segments, you must have a substantial price/performance advantage over the leader to get customers, and even then there are many who will stick with the industry leaders at almost any price. The best example of this I can think of is Amdahl. Even though their machines are compatible, highly reliable, etc. IBM still sells many times more machines with inferior price/performance (of course I realize there are other reasons too, but that's a major one). I won't deny that the VAX architecture is a handicap in building fast machines, but DEC has the market lead and resources to pursue any of several options over the next few years. I don't know if they can build complex VAXen to compete with more easily scalable architectures indefinitely, but they can afford to eat some loss in margin until they figure out what to do. Russell Williams ..{ucbvax!sun,lll-lcc!lll-tis,altos86}!elxsi!rw
bjj@psueclb.BITNET (09/12/87)
This is in response to the claim that a VAX is not readily pipelined, or at least that there are limits to the pipelining. The same problem didn't seem to stop the designers of the Harris HCX7. From: johnw@astroatc.UUCP (John F. Wardale) >>> The 1st byte of each 370 instruction tells the length of the instruction! >> You have pin-pointed one of the VAX's problems. This does not prevent, >> absolutely, pipelining. > Ok, but it ties the designers hands and one foot behind his back!!! The Harris HCX7 is roughly similar to the VAX 8700. You have to look real close at the instruction set to see the differences between the VAX and HCX instruction sets. Like the VAX, the HCX opcode does not indicate the instruction length, rather each operand encodes its own length. The HCX7 seems to have solved the instruction decode problem with a separate instruction cache. The cache contains DECODED instructions stored as fixed length 73 bit words. Assuming a reasonable hit ratio in the 4K word instruction cache, the extra complexity of instruction decode won't affect speed. The HCX7 has a 100nS clock cycle. Most instructions execute in 1 cycle. Harris describes a 3 level pipeline which simultaneously processes: 1) Instruction Fetch 2) Address Calculation 3) Instruction Execution > I believe the encoding of VAX instructions prevents one from > making it go fast, while still being affordable. (Its a > point-of-diminishing returns question.) Comments...Anyone think it'd be > (economically) worth building a VAX 3X or 10X the current top-vax? > Or has it [as I feel it has] reached the limit for current technology. I shouldn't think a cache of decoded instructions would be unaffordable, you have to cache them somewhere anyway. Some of us have long been amazed at what others will pay to buy IBM. When DEC starts charging for a VAX CPU what IBM charges for a 3090, then it will be easier compare affordability.
guy@sun.uucp (Guy Harris) (09/13/87)
> This is in response to the claim that a VAX is not readily pipelined, > or at least that there are limits to the pipelining. The same problem > didn't seem to stop the designers of the Harris HCX7. Credit where credit is due department: the HCX-7 either is an OEM'ed CCI Power 6/32, or is derived from it; the decoded instruction cache, etc. came from the 6/32. (The 6/32 instruction set is, indeed, similar to the VAXes, although the addressing modes are different. The 6/32 is big-endian (for compatibility with CCI's smaller 68K-family machines); some 6/32 instructions that are just like the VAX ones have the VAX instructions' opcodes, except that the nibbles are swapped....) -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com (or guy@sun.arpa)
ehj@mordor.s1.gov (Eric H Jensen) (09/14/87)
In article <818@PSUECLB> bjj@psueclb.BITNET writes: >The HCX7 seems to have solved the instruction decode problem with >a separate instruction cache. The cache contains DECODED instructions >stored as fixed length 73 bit words. > >Assuming a reasonable hit ratio in the 4K word instruction cache, >the extra complexity of instruction decode won't affect speed. This last statement is not strictly true - refill time can be adversely affected in the following ways: 1) In most cases there is at least one additional pipe stage to do the decode. If there isn't it would seem to me that there are other problems/limitations with the arch/design. 2) Games that can be played with loading > 1 32-bit (pick your favorite RISC) instruction simultaneously into an icache line are 2-4(*) times as 'expensive' to apply to a pre-decoded icache - 2-4x the amount of decode logic, 2-4x the number of fast rams, 2-4x the number of muxes (multi-set icache), etc etc. 64 bits (2*32) is a lot more attractive than 146 bits (2*73). 3) From discussions with other designers it appears evident that program loops can not be relied on to hide non-aggressive icache design. I would consider the above approach to be complex but not aggressive. The complexity is applied to overcome instruction set limitations and uses up much of the design space that could be used for an aggressive icache design. (*) simultaneously loading 2-4 32-bit instructions seems within reason with the ECL technologies I design with.