haapanen@watdcsu.UUCP (Tom Haapanen [DCS]) (11/11/84)
< Nami nami nami nami ... > The latest BYTE (should I be ashamed that I still subscribe? :-) has an article by John Markoff on RISC (Reduced Instruction Set Computer) chips. In particular, the article concentrated on the Berkeley RISC I and RISC II designs. Even though the instruction set looks horribly insufficient, I suppose it could be lived with, especially with the 138 registers RISC I has. According to the limited benchmarks in the article, the 1.5 MHz RISC I was able to beat up on a 8 MHz MC68000, and the 5 MHz RISC II runs 'integer C programs' faster than a 10 MHz NS32016 and a 12 MHz MC68000. The article does not mention how many registers the RISC II has, but it does say that a 12 MHz RISC II has been fabricated, though. What I'm wondering about, though, is whether it is feasible to build a RISC chip in the VAX 11/780 class, i.e. on par with the MC68020 (note: this is only to imply the the 68020 is in the same *class* as a 780, not necessarily equal performance). Apparently the RISC II contains 44,500 transistors, as opposed to the 68020's 200,000, so at least there is a lot of room to cram more stuff in. However, will this improve performance significantly? The article also vaguely refers to the Pyramid 90x having register windows. Does this mean the Pyramid is a RISC design, or does it just have large register banks? Are there any other RISC designs and/or chips commercially available, or will there be in the near future? Comments would be appreciated, especially from people involved in the Berkeley RISC or SOAR projects, or the Stanford MIPS project. Tom Haapanen University of Waterloo (519) 744-2468 allegra \ clyde \ \ decvax ---- watmath --- watdcsu --- haapanen ihnp4 / / linus / The opinions herein are not those of my employers, of the University of Waterloo, and probably not of anybody else either.
henry@utzoo.UUCP (Henry Spencer) (11/13/84)
> Even though the instruction set looks horribly insufficient, I suppose > it could be lived with... The whole point of a RISC machine is that you don't live with it; the compiler does that for you. So long as it runs his programs quickly, the nature of the instruction set is not really the customer's concern. The simplicity makes life lots easier for the compiler and the chip designer. > The article does not mention how many registers the RISC II has ... I think RISC II has roughly twice the register count of RISC I. Note that your programs don't get access to all of them simultaneously, so this is an implementation/performance issue mostly. > What I'm wondering about, though, is whether it is feasible to build a > RISC chip in the VAX 11/780 class... Remember that the current RISC chips are using mediocre MOS processing and easy-and-simple design rules. Last I heard, running at the original target clock speed (which probably hasn't been reached yet), the RISC design was tentatively benchmarked (by simulation) as substantially faster than a 780. This was with, I think, 400-ns memory access times, and an effective instruction cache was assumed. > ... Apparently the RISC II contains > 44,500 transistors, as opposed to the 68020's 200,000, so at least > there is a lot of room to cram more stuff in. However, will this > improve performance significantly? If the RISC people got 200k transistors to play with, probably tops on the agenda would be an instruction cache. Since the RISC designs execute a lot of instructions relative to a conventional design, they benefit a lot from faster instruction fetches. > The article also vaguely refers to the Pyramid 90x having register > windows. Does this mean the Pyramid is a RISC design, or does it just > have large register banks? The Pyramid is *claimed* to be more-or-less a RISC design, although from the sounds of the glossies, they've succumbed to the temptation/need (hard to tell which) to add tailfins and "features". Note that it's not a VLSI RISC, it's a RISC design implemented in ordinary logic. > Are there any other RISC designs and/or > chips commercially available, or will there be in the near future? Don't think there are any other commercial RISC designs just yet, although there may be half a dozen startup companies about to prove me wrong. As far as I know, there are *no* commercial RISC chips at the moment. The idea has generated enough enthusiasm that all kinds of people are likely to jump on the bandwagon in the near future. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
kiessig@idi.UUCP (Rick Kiessig) (11/14/84)
RISC machines may not be as good as they look at first glance. The Berkeley implementations, in particular, gain nearly all of their performance by using register windows. The simplicity of the instruction set may in fact slow these chips down, although it certainly makes them easier to implement. -- Rick Kiessig {decvax, ucbvax}!sun!idi!kiessig {akgua, allegra, amd, burl, cbosgd, decwrl, dual, ihnp4}!idi!kiessig Phone: 408-996-2399
henry@utzoo.UUCP (Henry Spencer) (11/16/84)
> RISC machines may not be as good as they look at first > glance. The Berkeley implementations, in particular, gain > nearly all of their performance by using register windows. > The simplicity of the instruction set may in fact slow these > chips down, although it certainly makes them easier to > implement. Don't knock implementation ease. No way could the Berkeley people have built a chip with all those registers unless the control section was very simple. There is also the telling observation that many existing CISC machines are notorious for being faster if you "hand code" the complex sequences rather than relying on the all-singing-all-dancing fancy instructions. For example, C function calls are faster on an 11/70 than on a VAX, even though it's one instruction on the VAX and about a dozen on the 70. Another way to look at it is that the RISC machine is a machine with unusually "clean" microcode that is executed directly out of main memory, and generated directly by compilers. Assuming that the cleanliness and the memory fetches aren't a major problem, clearly it is faster to write your own microcode for a specific job than to rely on the CPU designer's ROM microcode. "If the big performance win is register windows, then the rest of the CPU should be made as simple as possible, i.e. a RISC." -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
jlg@lanl.ARPA (11/20/84)
It is obviously possible to build a RISC machine that is in the same class as a VAX. But why would you want to when some RISC-like machines have been running for years MUCH FASTER than a VAX? These are the CDC machines, the CRAY machines, and the more recent vector processor machines 'from the east.' For example, the CRAY machine is VERY RISC-like. There are two data addressing modes corresponding to the VAX 'literal mode' and the VAX 'displacement mode'. There are two branch addressing modes corresponding to the VAX 'literal mode' and the VAX 'register mode'. No instructions other than loads, stores, and branches address the memory. All the other instructions use 'register mode' for their operands, mostly three address code. Contrary to the remarks of previous submitters, there is no difficulty achieving very high speed floating point arithmetic on a RISC-like machine. In fact the floating point units on the CRAY-1s machine are just one clock slower than their integer counterparts. There are several differences between the CRAY machines and the RISC machines proposed by Peterson and others. The most important being the lack of orthogonality in the instruction set (although the CRAY-2 promises to fix this deficiency to some extent) and the lack of a high speed context switching mechanism. This last point is offset somewhat by the ability to 'block load' or 'block store' certain register sets (unfortunately, the present compilers don't make particularly good use of this feature). Another major difference between the two types of machines is the presence in the CRAY of several different functional units each with different timing characteristics. This requires extra logic to reserve registers until the operation is completed. So far I have described only the scaler part of the CRAY machine, and for good reason. Even without vector operations, the CRAY is MUCH faster than a VAX. I suspect that a VLSI version of the CRAY scaler instruction set would be able to outperform a VAX built with the same technology. The advantages of the reduced instruction set combined with the simpler memory interface (only two addressing modes with NO virtual memory support) would allow the 'micro CRAY' to be clocked at much higher rates. Of course, I doubt that the CRAY archetecture could be put on a single chip with todays technology, but it could probably be done with a small set of chips for each functional unit. Programming a RISC machine is simple as compared to CISC machine - far from being 'woefully inadequate' the RISC type of machine seems just right. In a CISC machine there are usually about half a dozen different ways of performing any given function, the most obvious is usually NOT the fastest, or even close. On a RISC machine, the most obvious code sequence is almost always the fastest - it may be the ONLY obvious code sequence. After 17 years of assembly coding I came to the conclusion the the CRAY instruction set was the easiest to use of any machine I have seen. And after two years of compiler maintenance on the CRAY I concluded that the instruction set was the easiest to write a compiler for as well (the CRAY compiler is such a poorly written thing that it would probably never have even worked on another machine). The only really difficult part is scheduling vector operations, which became much easier on the new X/MP machines. A word needs to be said about the lack of addressing modes and virtual memory. At the speeds at which RISC machines will run (not the demo units made from MOS but the real production chips that (I hope) will come out) memory will be the slowest component of the system. On the CRAY, only the reciprocal approximate is slower than a memory fetch, all other operations are at least twice as fast (integer add is 7 times as fast, logical operations are 14 times as fast). Staged memory is a help (several fetches or stores going simultaneously), but all the other functional units are staged as well. It makes sense to limit memory traffic to just loads and stores so that other functional units don't end up waiting for memory references. It also makes sense to limit the number of addressing modes so that memory traffic doesn't get even slower due to the extra checking and circuitry in the memory interface. If memory traffic is slow, then traffic to the secondary storage (disk or whatever) is REALLY SLOW. The data transfer rate for the standard CRAY drive (CDC DD-29) is 38.7x10^6 bits/sec, and the sector size is 512 words (64 bits/word); less than a millisecond per word - or about 68,000 cpu cycles!! This doesn't even count seek time, latency, or scheduling the traffic with the channel. Obviously, the operating system would have to suspend your task until the page had been loaded, and it is also clear that no ammount of 'lookahead' in the paging scheme could significantly improve the performance of the paging scheme. The solution is not to page, but to provide a very large amount of central memory. With large central memory, there is always enough room for code (it's small) but data may still need to be kept on secondary storage. Fortunately, it's usually possible to write code which anticipates its data needs and issues reads and writes (asynchronous of course) long in advance of the use of that data. Short of that, reads and writes don't do that much worse than paging would have done anyway. I'm looking forward to the first commercial RISC chips (or chip sets). I expect that to be competitive thay will have several functional units (each staged), only one or two addressing modes, a large central memory requirement, and no virtual addressing capability. With this combination, I think RISC could outrun any other small computer available.
jlg@lanl.ARPA (11/20/84)
In the preceeding note I claimed that the CRAY disk transfer rate was less than a millisecond per word. Obviously it is - it's less than a millisecond per sector, which is 512 words. This 512 word block still corresponds to about 68,000 cpu cycles though, a lot of time any way you slice it!
bcase@uiucdcs.UUCP (11/22/84)
[bug lunch] The Ridge-32 is a RISCy machine. It does not have lots of registers, but rather has a very simple instruction set. It can do 8 MIPS if everything is in registers. The RISC I has 78 registers, the RISC II has 138. The Pyramid machine is not a RISC machine even though it does have the register windows. It is microcoded and does pay a penalty, but in the interest of getting the machine out fast, it was a big win for Pyramid to use microcode. There are some RISC designs in the pipelines of some companies: HP has a project called SPECTRUM which is to be the basis of "all" future big HP computers (I may have that phrased a little incorrectly). It "should" already be out, but you know big companies. Inmos has been spouting off about the Transputer, a single-chip RISC machine, but we haven't seen too many of those either. DEC is experimenting with RISC. SUN is experimenting with RISC. Weitek (spelling?) (the makers of floating point chip sets) is experimenting with RISC. In short, anyone who has any sense and does not fear the idea of being incompatible with past designs (at the instruction set level), is toying with RISC designs. There are some start-ups whose sole goal in life is to exploit the benefits of the RISC concepts (not all of which are to be found in papers and especially not in BYTE), and there are some companies trying to form with the same goal in mind. What are the main RISC concepts and why are they so important? Main concepts: efficient storage hierarchy and minimal instruction interpretation overhead. The importance of an efficient storage hierarchy should be obvious, but not so obvious is the fact that the register set of the machine is really just the fastest part of the hierarchy. The storage hierarchy is MUCH more important than the number of instructions executed to implement a given function (within reason of course). The importance of minimal instruction interpretation overhead has far reaching implications, including not wasting time, not wasting silicon, making it easier for a compiler to decide which instructions, or sequence of instructions, to use, making it easier to pipeline the machine, etc. Another important property of RISC machines (here machine means the hardware together with the compiler) is that they tend (given good compilers) to REUSE results and data rather than RECOMPUTE or REFETCH results and data. As an example, take some VAX instruction which computes the address of some stack-local variable: OPCODE 4(sp),x,y If this instruction is in a loop, then the computation of 4+sp will be done on EACH iteration of the loop! This is clearly wasted effort. A good RISC compiler will factor out this computation and place the address in a register (thus, we begin to see the need for lots of registers). Now you say, "but the VAX compiler could do this also!" Sure, but then what becomes of the sp+offset addressing mode? It is surely not nearly as important now, and all the control needed to implement it is less utilized, or perhaps little utilized (there will still be some other places where register+offset will be useful), and if it were eliminated, the whole machine may become faster by virtue of a shorter cycle time. But now, we need to get instructions to the CPU at shorter intervals, thus again reinforcing the need for an efficient storage hierarchy. If enough iterations of this measuring/simplifying algorithm are carried out, a RISC design is the result. The issue is certainly more compilcated than my discussion would lead one to believe (one problem is handling interrupts and other pipeline inconsistencies), but I think I have shed some light on the subject. There has been considerable work done in this area, and even though much more needs to be done, the ideas are mature enough that some are venturing into the commercial market. RISC machines promise to give us high performance designs in less time and space (and maybe money) than conventional machines. Plus, by using the RISC philosophies, machines can be designed for special purposes in cases where an attempt to do so wouldn't have been made before.