stern@inmet.UUCP (11/17/84)
In response to "Is is possible to build a RISC in the same class as a VAX?" I'd have to say the answer is yes and no. If you want to talk about integer computations, procedure entry/exit, and data movement, by all means yes. In theory, it would be possible to design and implement a RISC with a performance in the same class as a VAX -- meaning only that the RISC could shuffle data, access memory with a *few* addressing modes (direct and register indirect are probably the easiest) and zip through some integer computation code faster than a VAX 780. RISC architecture is not suited for floating point operations, or matrix multiplications, since the arithmetic steps produce a very tight bottleneck. In my somewhat young opinion, a RISC is ideal for LISP hacking -- most of what you are doing is searching, matching, and chasing pointers. In this application, a RISC beats a VAX hands-down. The answer to the original question is also "no," due to the problem of i/o on a RISC. Attaching an ACIA (or similar character-oriented i/o device) to the RISC could be a big difficulty -- the cycle times of the two devices are different by one or two orders of magnitude. You can either halt the RISC while doing character i/o (boo hiss) or use the RISC as an attached processor: plug it into a spare UNIBUS slot and let a VAX share its memory. The i/o problem is then solved: whatever OS you have running on the VAX does the i/o, and the RISC does the nasty computation. Perhaps a better question would be "Do you want to run a RISC with the power of a VAX stand-alone?" I think it might be better to plug eight or sixteen RISC boards into a VAX, rehack UNIX to download microcode for appropriate tasks, and thereby save VAX cycles for cycle-hungry tasks. Example: Write a search routine to get downloaded onto a RISC. Put the relevant data in the common VAX/RISC memory and let the RISC rip. Mr. VAX goes and deals with somebody else's process, and the search job finishes *faster* than if the VAX had done it. This isn't just a marvel of parallel processing -- the RISC really would complete the task faster than a VAX could. They just can perform *every* task faster than a VAX. Hal Stern Intermetrics, Inc {ihnp4, harpo, esquire, ima}!inmet!stern
henry@utzoo.UUCP (Henry Spencer) (11/20/84)
> ... RISC architecture is not suited for floating > point operations, or matrix multiplications, since the arithmetic steps > produce a very tight bottleneck. ... Why? It is true that the *existing* RISC machines have no floating-point support, but then neither does the 68000. This does not imply that one cannot build an effective floating-point-crunching system around either. > ...[also, a RISC can't be a VAX] due to the problem of > i/o on a RISC. Attaching an ACIA (or similar character-oriented i/o device) > to the RISC could be a big difficulty -- the cycle times of the two devices > are different by one or two orders of magnitude. ... Uh, Hal, the cycle times of a 780 and an ACIA are also a little bit different... So you solve the problem the same way as on the VAX: if you want to pump lots of data through the i/o system, the i/o system has to do some of the work for you. Actually, this is roughly what your suggested solution amounts to, except that a VAX cpu is an inordinately expensive -- and not all that speedy -- i/o processor. -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
stern@inmet.UUCP (11/29/84)
[got the asbestos jumpsuit on?] (1) I said that RISCs are not suited for floating point/matrix multiplication because: (a) attaching a floating point coprocessor to a RISC perverts the very idea of a reduced instruction set; and (b) the original question asked if a RISC could be produced in the same class as a VAX. No RISC can perform every function of a 780 faster than a 780 because there are tradeoffs made when you give up instruction set complexity for speed. Usually floating point stuff is the first to go -- except when you make a purely floating-point processor, in which case everything has been given up. You can't have your floating point cake and eat it too. (2) Yes, an ACIA and a VAX have different cycle times. So do a washing machine and an ECL flip-flop. But if you build a fancy interface to the ECL flip-flops, the washing machine still won't be able to talk to them. Having a nice DZ-11 multiplexor on a VAx is great -- it provides intelligent communications processing, so the VAX doesn't burn cycles doing low-level ACIA handshaking. Now how could a RISC talk to a DZ? Letting them share a memory space might be the best thing, but gee, then you might need another processor to arbitrate memory accesses, and then it starts to look like plugging a RISC into a VAX again. My point is that if you want a RISC to do i/o *without another processor* then you have to add instructions/wait states/more complexity to get something with a 50-nanosecond cycle time to talk to a device with a 1-microsecond cycle time. Comparing a RISC to a VAX in terms of performance, when the RISC has two or three half- or quarter-VAXen doing its i/o, is cheating. Hey, my Toyota can outrace the space shuttle when you strap two solid rocket boosters to it. --Hal Stern {ihnp4, harpo, esquire, ima}!inmet!stern
guy@rlgvax.UUCP (Guy Harris) (11/30/84)
> (1) I said that RISCs are not suited for floating point/matrix multiplication > because: (a) attaching a floating point coprocessor to a RISC perverts > the very idea of a reduced instruction set; and (b) the original question > asked if a RISC could be produced in the same class as a VAX. No RISC > can perform every function of a 780 faster than a 780 because there are > tradeoffs made when you give up instruction set complexity for speed. > Usually floating point stuff is the first to go -- except when you make > a purely floating-point processor, in which case everything has been > given up. You can't have your floating point cake and eat it too. Anybody have any data on how a "strict RISC" (i.e., one cycle per instruction, which rules out multiply unless you have a parallel multiplier and rules out divide unless somebody has a non-iterative divide algorithm) does on jobs requiring a lot of multiplies and divides (which it has to do with library routines)? If it can do them as well as, say, an 11/780, it might be able to do software floating point as well also. Basically, think of RISC instructions as vertical microcode; if you can do it fast with vertical microcode, you can do it fast on a RISC as long as you can avoid going to memory for the instructions every time. Also, attaching a floating point coprocessor doesn't "pervert the very idea of a reduced instruction set". If you can't do FP fast on a RISC, there's no need to go to instruction set complexity except for the floating-point instructions. A RISC with an FP chip might still be faster than a (more expensive, in cost or chip real-estate or whatever) CISC. I believe a lot of the reason for a RISC is that in non-number-crunching applications a lot of the cycles go to "bureaucratic overhead" (moving things from memory/ registers into other memory/registers, testing things, etc.) and that a RISC may be able to deal with that as well as or better than a CISC. Number- crunching may have a different character (although I remember a study by Knuth of FORTRAN programs that found that a surprisingly large number of lines were exactly that kind of "bureaucratic overhead" - "j = j + 1" counts as overhead if the only purpose "j" serves is to point to an element in an array full of the numbers you're crunching); if it does, maybe a "different horses for different courses" approach would be more cost-effective. > (2) Yes, an ACIA and a VAX have different cycle times. So do a washing > machine and an ECL flip-flop. But if you build a fancy interface to > the ECL flip-flops, the washing machine still won't be able to talk > to them. Having a nice DZ-11 multiplexor on a VAx is great -- it provides > intelligent communications processing, so the VAX doesn't burn cycles > doing low-level ACIA handshaking. Now how could a RISC talk to a DZ? > Letting them share a memory space might be the best thing, but gee, > then you might need another processor to arbitrate memory accesses, and > then it starts to look like plugging a RISC into a VAX again. What? Why couldn't you plug a DZ (which, by the way, hardly provides "intelligent communications processing" in the conventional sense of the word) to a RISC? Are you worried about the DZ not being able to do a bus cycle in a reasonable amount of time, and tying the machine up twiddling its thumbs waiting for the DZ to respond so it can finish the "move" instruction that's stuffing the character into the DZ's output buffer register? Presumably you can build a bus interface chip that's as fast as the RISC, and if your bus is fast enough it won't tie the RISC up for a long time (if it isn't fast enough, it'll tie up any fast machine up too long, not just a RISC). Equating a DZ and a bus interface to a VAX-11 in complexity is bogus; the UBA on a VAX may be big, but then it does a hell of a lot more than just act as a bus interface, given its multiple data paths, and maps, and so forth and so on. > Comparing a RISC to a VAX in terms of performance, when the > RISC has two or three half- or quarter-VAXen doing its i/o, is cheating. Who says it requires a half- or quarter-VAX to do the I/O for a RISC? > Hey, my Toyota can outrace the space shuttle when you strap two solid > rocket boosters to it. Not a bad analogy; if your goal is met better by the Toyota+rocket boosters than by the space shuttle, why not go with it? If you're trying to break the land speed record, you build a land speed record car, not a jet plane with detachable wings and landing gear that can handle 700+ MPH. Guy Harris {seismo,ihnp4,allegra}!rlgvax!guy
phil@amdcad.UUCP (Phil Ngai) (12/01/84)
> --Hal Stern > is that if you want a RISC to do i/o *without another processor* then > you have to add instructions/wait states/more complexity to get something > with a 50-nanosecond cycle time to talk to a device with a 1-microsecond > cycle time. Give me a break. Haven't you ever heard of wait states, interrupts, or DMA? Did you know that an 8 MHz 8086 can't talk to a 4 MHz 8530 without wait states? No, I didn't need half a VAX to interface the 8086 to the 8530. What is your problem anyway? -- I'm not a programmer, I'm a hardware type. Phil Ngai (408) 749-5790 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil ARPA: amdcad!phil@decwrl.ARPA
padpowell@wateng.UUCP (PAD Powell) (12/02/84)
> (1) I said that RISCs are not suited for floating point/matrix multiplication > because: (a) attaching a floating point coprocessor to a RISC perverts > the very idea of a reduced instruction set; and (b) the original question So what? The concept of a RISC also violated the CISC aesthetic ideals. If you want to argue that the addition of specialized hardware to perform an algorithm unsuited for Von Neuman based architectures is an unsuitable addition to a RISC architecture, present proof, either mathematical, or physical (i.e.- run an experiment), but please do not appeal to "good taste." Patrick ("I'd put an FPA on my carburetor if it gave me better milage") Powell