baum@apple.UUCP (Allen J. Baum) (12/04/87)
-------- >However, you can buy 2 machines that are clearly 801-descendents: >HP Precision and MIPS R2000, which, as far as I can tell, are the >closest ones on the market to the 801. Whenever it comes out, the >78000 has a lot of similarities also. The only way that the HP Precision architecture can be considered is in spirit. Although there were former IBM'ers on the project, they weren't allowed to talk about it at all, and they didn't. To this day, I haven't talked with anyone who would tell me any details on the 801 architecture. Some details did come out in the papers at the ASPLOS conference, but they did not influence any of the design decisions on the Precision. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
mash@mips.UUCP (John Mashey) (12/04/87)
In article <6892@apple.UUCP> baum@apple.UUCP (Allen Baum) writes: >-------- >>However, you can buy 2 machines that are clearly 801-descendents: >>HP Precision and MIPS R2000, which, as far as I can tell, are the >>closest ones on the market to the 801. Whenever it comes out, the >>78000 has a lot of similarities also. >The only way that the HP Precision architecture can be considered is in spirit. >Although there were former IBM'ers on the project, they weren't allowed to talk >about it at all, and they didn't. To this day, I haven't talked with anyone >who would tell me any details on the 801 architecture. Some details did come >out in the papers at the ASPLOS conference, but they did not influence any >of the design decisions on the Precision. Sorry, I meant absolutely no hint that proprietary info got moved, and I did mean in spirit, especially of methodology of starting with serious optimizing compiler technology and doing substantial analysis. It is extremely interesting that there was no influence from the March 1982 Radin paper at ASPLOS; I would have thought that it would have been analyzed thoroughly at least for confirmation of direction, but I wasn't there, so I believe you. Certainly, it is interesting that both 801 and Spectrum used optimizing compiler-driven design, 32 registers, 32-bit instructions, separate I&D caches, and no windows. (MIPS can't be included as independent: we certainly had access to the published 801 documents before we started.) Either the similarities arise from the limited choices once you've made some of those decisions, or, starting from some of the same assumptions, and proceeding with realted methodologies, you get to somewhat similar designs. [There are of course all sorts of little differences amongst the 3 machines mentioned, but if you take the universe of RISC machines, they look more similar than most.] -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
baum@apple.UUCP (12/04/87)
-------- [] >In article <1047@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >Sorry, I meant absolutely no hint that proprietary info got moved, >and I did mean in spirit, especially of methodology of starting with >serious optimizing compiler technology and doing substantial analysis. >It is extremely interesting that there was no influence from the >March 1982 Radin paper at ASPLOS; I would have thought that it would have >been analyzed thoroughly at least for confirmation of direction, >but I wasn't there, so I believe you. > >Certainly, it is interesting that both 801 and Spectrum used >optimizing compiler-driven design, >32 registers, >32-bit instructions, >separate I&D caches, >and no windows. >(MIPS can't be included as independent: we certainly had access to the >published 801 documents before we started.) > >Either the similarities arise from the limited choices once you've >made some of those decisions, or, starting from some of the same assumptions, >and proceeding with realted methodologies, you get to somewhat similar >designs. [There are of course all sorts of little differences amongst >the 3 machines mentioned, but if you take the universe of RISC machines, >they look more similar than most.] I think you hit it right on the head. We did indeed use the ASPLOS papers for confirmation of our intuition & measurements, but they only confirmed, so nothing got changed. We did see some differences, but we liked our approach (for better or worse), so no differences there either. We did have a chance to consider register windows, since the RISC I papers were published by then, but decided that for future implementations, the large register file, and its decoding would be a bottleneck (impact critical path), and that smart register allocation would be sufficient. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
baum@apple.UUCP (12/04/87)
-------- [] >In article <1047@winchester.UUCP> mash@winchester.UUCP (John Mashey) writes: >It is extremely interesting that there was no influence from the >March 1982 Radin paper at ASPLOS; I would have thought that it would have >been analyzed thoroughly at least for confirmation of direction, A small addendum: while the 801 may not have played an enormous role, System/370 certainly did! Many of the design decisions in Spectrum were in reaction to problems we saw (& measured & were told about) in the 370 architecture, from both IBM and Amdahl. -- {decwrl,hplabs,ihnp4}!nsc!apple!baum (408)973-3385
aglew@ccvaxa.UUCP (12/12/87)
..> IBM 360, BCD, and COBOL support I wouldn't go so far as putting packed decimal into a modern machine, but unpacked decimal (ascii) might be another thing... except that it can be composed almost as well out of masks and binary arithmetic. As for COBOL support, well... I think we are about to pass the point where a scientific computer will do better at COBOL support than a business computer. Because, what's a business computer? ...Well, it has BCD - see above. It has good I/O - but scientific computers increasingly have good I/O, since they do graphics. It handles strings well - but most strings are short, or fixed length. And you can move a lot of characters through a 64 bit register, and do a lot of string operations 8 characters at a time, instead of one by one. Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@mycroft.gould.com ihnp4!uiucdcs!ccvaxa!aglew aglew@gswd-vms.arpa My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.
atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) (12/17/87)
In article <28200075@ccvaxa> aglew@ccvaxa.UUCP writes: > >As for COBOL support, well... I think we are about to >pass the point where a scientific computer will do better >at COBOL support than a business computer. "About to pass the point"? The places that used to run service bureaus with CDC 6600's knew this years ago. Once you get over the fixation that your problem is so different that the hardware designer has to tailor an instruction just for you, you realize that what you want is something that does some basic functions fast, and let the programmer construct the other stuff. The design problem is to choose the right basic operations. The DG Nova, PDP-8, and the CDC 6600 all gave very impressive performances on commercial applications even though many people claimed they were not "designed" for this type of application. Basically this should be the "RISC" argument, but that term seems to have been co-opted for a very narrow range of hardware design strategies. In particular I have seen statements that RISC must be - 1 microcode cycle per clock cycle. Why assume a synchronous (clocked) hardware implementation? There have been a number successful machines with asychrounous CPU's - 1 microinstruction per instruction Why assume microcode at all? Microcode is certainly a valuable hardware design technique, but again it is not manditory - register windows with "general purpose" registers. A neat idea, but again, special purpose register architectures have done impressive things in the past. I've often wondered if most of the performance gains quoted for the "RISC" machines can be attributed to the fact that someone decided the best thing to do with the registers was to use them for passing the first few arguments, and that similar gains can be made on other machines by making the compiler pass the first few arguments in registers, instead of expecting that the callee preserves his registers. I'm not saying any of these are bad ideas. They clearly aren't. It just seems that a lot of discussion is going on with assumptions that all computers are implemented with a particular methodology, or must have a certain architectural feature. Those pushing the simple and fast approach must also be aware of why machines acquire the specialized fancy instructions, such as packed decimal. Given that one has an existing implementation of an architecture, there will always be some commercially important applications that the machine is "poor" at. (as defined by some customer with money). The engineer can go back to the drawing board and re-engineer the whole machine to make it faster, but often adding some opcodes and some hardware to do the job. (I am using the term extra hardware loosely, this could mean some more gates on the cpu chip). Sometimes no extra hardware is needed, just an addition to the microcode (if microcode is used in the implementation). Of course when the next total re-engineering does occur, tradeoffs will be made, and because the new datapath layout will mean the some of the instructions can't be implemented with the previous techique. So they may be done with a long slow microcode sequence, and it may well be that on the new machine the sequence that was used before the new feature was added is faster at doing the job wanted by the application than using the feature. The reason the feature was added was valid, it made the machine significantly faster. The reason for maintaining it is valid, it preserves object code compatibility.
esf00@amdahl.amdahl.com (Elliott S. Frank) (12/19/87)
In article <12181@orchid.waterloo.edu> atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) writes: > > Once you get over >the fixation that your problem is so different that the hardware >designer has to tailor an instruction just for you, you realize >that what you want is something that does some basic functions >fast, and let the programmer construct the other stuff. The >design problem is to choose the right basic operations. > Amen. The Amdahl 580 (a 370-compatible [CISC] machine designed ca. 1978-79) was designed with contemporary 'UNIX machine' features -- separate I and D caches, etc. It turned out it ran 360/370 COBOL programs like the proverbial 'bat out of hell'. -- Elliott Frank ...!{hplabs,ames,sun}!amdahl!esf00 (408) 746-6384 or ....!{bnrmtv,drivax,hoptoad}!amdahl!esf00 [the above opinions are strictly mine, if anyone's.] [the above signature may or may not be repeated, depending upon some inscrutable property of the mailer-of-the-week.]
pds@quintus.UUCP (Peter Schachte) (12/23/87)
In article <12181@orchid.waterloo.edu>, atbowler@orchid.waterloo.edu (Alan T. Bowler [SDG]) writes: > .... Once you get over > the fixation that your problem is so different that the hardware > designer has to tailor an instruction just for you, you realize > that what you want is something that does some basic functions > fast, and let the programmer construct the other stuff. The > design problem is to choose the right basic operations. That's the issue, alright. For symbolic languages, tagged pointer operations are very important. Typically, tagged dispatch and tagged pointer following are done quite a lot, and cutting the number of machine instructions to do these things can make quite a difference in performance. Take the example of following a tagged pointer. If the tag is kept in the high few bits of a 32 bit address, one must and the pointer with a 32 bit constant. Quite a lot of overhead, when a simple addressing mode that ignored the high, say, 4 bits of the address would do the trick perfectly. I know this runs contrary to the RISC ideal. But on a CISC, this is no more arcane than some of the other addressing modes. Similarly for tagged dispatch. If there were an instruction to take the top, say, 2 bits of a register, shift them right a whole bunch of places, add them to a given address, and jump to the address stored there. Sure, this could be done with a shift or rotate, an and, an add, and an indirect jump. But wouldn't you rather do one instruction than 4? These are some of the operations that would make Lisp and Prolog run faster. I'm sure each language, and each class of languages, has it's own favorite chip features. The important questions are: how much will a given feature speed up a given task? How much will it cost (in terms of $, speed of other operations, etc.)? And how important is that task? I imagine BCD probably wasn't worth it. Perhaps the features I've just asked for aren't worth it either. Maybe it would be better, on average, to have a scaled, post-incremented, memory indirect addressing mode (0.5 :-), 0.5 :-(). Or a 48 bit one's complement multiply instruction, or whatever. The point machine designers should take into account is that more and more, people are buying general-purpose hardware rather than the more expensive specialized hardware. Therefore, they should design their machines taking symbolic languages, CAD, and other specialized tasks into account. -- -Peter Schachte pds@quintus.uucp ...!sun!quintus!pds
aglew@ccvaxa.UUCP (12/25/87)
..> Peter Schachte ...!sun!quintus!pds, and others, responding and amplifying ..> my statement that scientific processors (general purpose processors) ..> can do special purpose work as fast as other processors. Tagged Operations: As Peter points out, a natural way to support tags is to place them in the high order bits, and have an architecture that ignores, say, the top 4 bits. I work on such an architecture, and we have a common LISP that takes advantage of it. Except that it was a pain to port this LISP to a new machine that ignored fewer of the top order bits (is that correct Brian, Scott?) Of course, another way is to put the tags in the low order bits, since tagged systems usually don't need object addresses of byte granularity - the smallest object is usually a word or two. Lately, I have been thinking that the best thing to do is to implicitly AND mask all addresses with a loadable ADDRESS_MASK value. The AND masking can be done by dedicated gates away from the ALU, and so shouldn't stretch your critical path (although it is close to the critical memory address generation, I think that it would fall in the slack at the end of one pipeline stage). The biggest advantage of of ADDRESS_MASK would be that it would let you support applications with different ideas of the shape of the address on the same machine; and it would provide a way for you to increase the address space while still letting old, broken programs that rely on overflow of a 32 bit quantity to work. Ie. programs that rely on addresses being 32 bits would have an address mask 0x0FFFFFFFF; programs that rely on addresses being 40 bits would be 0x0FFFFFFFFFF; and so on. There are quite a few programs that rely on 24 bit and 16 bit addresses, even now. Hardware would, of course, limit the values that can be loaded into the ADDRESS_MASK - 32 bits now, but tomorrow 40 bits, then 48 bits, and so on. Using ADDRESS_MASK for tags is obvious: If you are using 2 bits of high order tags on a 32 bit machine, set your mask to 0x3FFFFFFF; if you are using 2 bits of low order tags, and you want to avoid a misaligned address trap, set mask to 0xFFFFFFFA. This is not unlike the LOAD-TAGGED instructions in SPARC and SPUR and SOAR; the main difference is that the architecture does not force any decisions, as to the size of the tag, etc., onto the implementor of the tagged language system. Andy "Krazy" Glew. Gould CSD-Urbana. 1101 E. University, Urbana, IL 61801 aglew@mycroft.gould.com ihnp4!uiucdcs!ccvaxa!aglew aglew@gswd-vms.arpa My opinions are my own, and are not the opinions of my employer, or any other organisation. I indicate my company only so that the reader may account for any possible bias I may have towards our products.
hank@spook.UUCP (Hank Cohen) (01/05/88)
In article <19825@amdahl.amdahl.com> esf00@amdahl.amdahl.com (Elliott S. Frank) writes: > >The Amdahl 580 (a 370-compatible [CISC] machine designed ca. 1978-79) >was designed with contemporary 'UNIX machine' features -- separate >I and D caches, etc. It turned out it ran 360/370 COBOL programs like >the proverbial 'bat out of hell'. >-- I always found the most interesting feature of the Amdahl 580 to be the single cycle decimal adder in the ALU. It was (when I last knew the details of such things) <24nS. I believe that the CPU designers were willing to use a lot of gates to achieve that speed. Amdahl does a lot of simulation and analysis of their instruction mix and is willing to design the machine to optimize certain benchmarks. When you run a lot of COBOL and PL/I a fast decimal adder makes sense. Perhaps Elliot could tell us how fast the 5890 decimal adder is.