wbeebe@rtmvax.UUCP (Bill Beebe) (03/11/89)
In article <1552@vicom.COM> hal@vicom.COM (Hal Hardenbergh (236)) writes: >Hal Hardenbergh [incl std dsclmr] hal@vicom.com >Vicom Systems Inc 2520 Junction Ave San Jose CA (408) 432-8660 >anybody remember the newsletter DTACK Grounded ? I remember DTACK, and your column in Programmer's Journal. I also remember your comments concering RISC. Do you still feel RISC is the equivalent of modern snakeoil? ;-)
bcase@cup.portal.com (Brian bcase Case) (03/12/89)
>> So where did Intel get the extra 22% ? >Intel has a 64-bit bus, MIPS 32 bits. It will take two clocks for MIPS to load >an immediate value. Intel can do it in one clock. If 22% of the instruction >mix involves immediate values, then we know where Intel 'got it.' Unless I am missing something, Intel cannot "do it in one clock." The mechanism used by the i860 is essentially the same as on other RISCs: load the low 16 bits, then overload (or add) the high 16 bits. Even in dual instruction mode, it takes two instructions (and usually, cycles). >anybody remember the newsletter DTACK Grounded ? Yes, I do. This is the newletter that printed the *WORST* (most inaccurate) assesment of RISC that I have yet read. However, I think it was a letter from a reader, and so doesn't necessarily represent the newsletter's position.
hal@vicom.COM (Hal Hardenbergh) (03/14/89)
In article <15690@cup.portal.com> bcase@cup.portal.com (Brian bcase Case) writes: >anybody remember the newsletter DTACK Grounded ? >>Yes, I do. This is the newletter that printed the *WORST* (most inaccurate) >>assesment of RISC that I have yet read. However, I think it was a letter >>from a reader, and so doesn't necessarily represent the newsletter's position. That letter came in "over the transom," meaning it was not for attribution. The writer worked for Big Blue at the time. Uh, what did you think of Nick Tredennick's assessment of RISC in the Feb issue of Microcomputer Report? Was that the _second_ most inaccurate assessment of RISC that you have yet read?
slackey@bbn.com (Stan Lackey) (03/14/89)
In article <1562@vicom.COM> hal@vicom.COM (Hal Hardenbergh (236)) writes: >Uh, what did you think of Nick Tredennick's assessment of RISC in the Feb issue >of Microcomputer Report? Was that the _second_ most inaccurate assessment of >RISC that you have yet read? I found his article to be more an accurate assessment of microprocessor trends in general. I just KNEW his article would get mentioned here sooner or later. A lot of the stuff he said needed to be said eventually. RISC is indeed a technology window, driven largely by the amount of stuff you can fit in a chip. Look at what is being added now that you can fit more than a simple CPU core in a chip: Floating Point 29000 null-terminated-string handling instructions Choice of endian-ness Caches; in fact with extremely complex, multi-mode capabilities Fully associative address translation caches Harvard architecture Multiprocessor capability The trend in computer evolution is truly toward greater hardware complexity. This has been demonstrated countless times. The reversion back to too much simplicity did happen in the late 70's, but here we are again, back on the same curve. There is a true need for complexity. How many times when reading this newsgroup do you see things like, "Yes but that chip doesn't have <my favorite feature>" where the feature is anywhere from instruction cache size to multiprocessor cache invalidation (see N-10 bashing)? Hardly a RISC headset! Pure RISC religion is to keep things as simple as possible in order to make the cycle time as fast as possible. This can only go so far; real memories must be used, the chip must be interfaces to with real hardware, etc. Clearly, the way to get past this brick wall is to do more in parallel, either with more powerful instructions (including VLIW/compiler technology), and/or multiprocessing. Companies must make money. They will do this by making not tiny low-cost RISC micros, but the most complex thing they can fit in a chip. They need this so they can get product differentiation and thus better margins. The million transistors will NOT be used entirely for large caches, but for more instructions, addressing modes, faster floating point, elegant exception handling, etc. And, just watch, they will still find a way to call them RISC's! I predict that the next hardware features to come back will be auto-increment addressing and the hardware handling of unaligned data. I am not saying that RISC is bad, but it was an interesting exercise from which we all learned a lot. :-) Stan
bcase@cup.portal.com (Brian bcase Case) (03/15/89)
>Uh, what did you think of Nick Tredennick's assessment of RISC in the Feb issue >of Microcomputer Report? Was that the _second_ most inaccurate assessment of >RISC that you have yet read? [compared to the one in DTACK Grounded] Well, I am a contributing editor to The Microprocessor Report. I dissagree with nearly everything Nick says when he talks about RISC. [E.g., arguing that RISC architects are going to be forced to add complex instructions to the simple sets because of pressure from compiler writers, OS guys, etc. is ridiculous to me, but this is one of the things he says.] However, Nick is a bright guy and is tremendously entertaining when speaking to a crowd. If anyone can convince me to rethink my views, it is him. He is entitled to his views, and his article was printed as a "Viewpoint" in uP Report. One thing I have relearned from pointing out the letter in DTACK Grounded: People in glass houses shouldn't throw stones. Sigh.
henry@utzoo.uucp (Henry Spencer) (03/17/89)
In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >The trend in computer evolution is truly toward greater hardware >complexity. This has been demonstrated countless times. The >reversion back to too much simplicity did happen in the late 70's, but >here we are again, back on the same curve... Except, this time the complexity added will be *useful* complexity, with any luck. No, we are not headed back towards CISC. >... The million transistors will NOT be used entirely for >large caches, but for more instructions, addressing modes, faster >floating point, elegant exception handling, etc... Faster floating pooint, okay. More instructions and addressing modes? *Why?* They don't gain you anything, unless you start talking about VLIW or other such significant departures. Elegant exception handling? Frankly, the relatively simple exception handling on many of the current RISCS is much more elegant than all the garbage that showed up on the CISC machines. >I predict that the next hardware features to come back will be >auto-increment addressing and the hardware handling of unaligned data. Again, why? Auto-increment addressing is useful only if instructions are expensive, because it sneaks two instructions into one. However, the trend today is just the opposite: the CPUs are outrunning the main memory. Since instructions can be cached fairly effectively, they are getting cheaper and data is getting more expensive. Doing the increment by hand often costs you almost nothing, because it can be hidden in the delay slot(s) of the memory access. Autoincrement showed up best in tight loops, exactly where effective caching can be expected to largely eliminate memory accesses for instructions. Why bother with autoincrement? As for hardware handling of unaligned data, this is purely a concession to slovenly programmers. Those of us who have lived with alignment restrictions all our professional lives somehow don't find them a problem. Mips has done this right: the *compilers* will emit code for unaligned accesses if you ask them to, which takes care of the bad programs, while the *machine* requires alignment. High performance has always required alignment, even on machines whose hardware hid the alignment rules. Again, why bother doing it in hardware? -- Welcome to Mars! Your | Henry Spencer at U of Toronto Zoology passport and visa, comrade? | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
bcase@cup.portal.com (Brian bcase Case) (03/17/89)
>RISC is indeed a technology window, driven largely by the amount of >stuff you can fit in a chip. Look at what is being added now that you >can fit more than a simple CPU core in a chip: > > Floating Point RISCs can have floating point; floating-point is not CISCy. > 29000 null-terminated-string handling instructions There's only one, and it is *QUITE* simple. Even if it shouldn't be there, this is does not support what Nick says. > Choice of endian-ness This adds about 2 gates to the design. > Caches; in fact with extremely complex, multi-mode capabilities ??? This might have some effect on the instruction set, but the effect should not be to make the basic instructions go slow. > Fully associative address translation caches This is architecturally neutral. > Harvard architecture This is simply required for performance, regardless of the instruction set. > Multiprocessor capability I don't see how adding multiprocessor capabilities makes a RISC into a CISC. None of this stuff is inconsistent with RISC. These are not CISCy things. If you are going to complain, make your complaints valid.
aglew@mcdurb.Urbana.Gould.COM (03/17/89)
>Elegant exception handling? >Frankly, the relatively simple exception handling on many of the current >RISCS is much more elegant than all the garbage that showed up on the >CISC machines. Bravo! Who needs vectored interrupts? How often does your device know better where to interrupt to than you do? But (a bit more) seriously: how can interrupt (not exception) handling be made better/worse? As an erstwhile systems programmer in a real-time OS, I know that we often wished that interrupts could be treated exactly like processes, going through the same priority or deadline driven scheduler. Yet applying RISC principles to the hardware that would be needed to do something like this, I often arrive at the conclusion that a simple single entry point first level handler is all that is appropriate. Everything else seems to need sequencing.
tim@crackle.amd.com (Tim Olson) (03/18/89)
In article <1989Mar16.190043.23227@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: | In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: | >I predict that the next hardware features to come back will be | >auto-increment addressing and the hardware handling of unaligned data. | | Again, why? Auto-increment addressing is useful only if instructions | are expensive, because it sneaks two instructions into one. However, | the trend today is just the opposite: the CPUs are outrunning the | main memory. Since instructions can be cached fairly effectively, | they are getting cheaper and data is getting more expensive. Doing | the increment by hand often costs you almost nothing, because it can | be hidden in the delay slot(s) of the memory access. Autoincrement | showed up best in tight loops, exactly where effective caching can be | expected to largely eliminate memory accesses for instructions. Why | bother with autoincrement? Also, auto-incrementing addressing modes imply: - Another adder (to increment the address register in parallel) - Another writeback port to the register file Unless you wish to sequence the instruction over multiple cycles :-( I'm certain that most people can find something better to do with these resources than auto-increment. | As for hardware handling of unaligned data, this is purely a concession | to slovenly programmers. Those of us who have lived with alignment | restrictions all our professional lives somehow don't find them a problem. | Mips has done this right: the *compilers* will emit code for unaligned | accesses if you ask them to, which takes care of the bad programs, while | the *machine* requires alignment. High performance has always required | alignment, even on machines whose hardware hid the alignment rules. | Again, why bother doing it in hardware? The R2000/R3000 can also trap unaligned accesses and fix them up in a trap handler. This is what the Am29000 does, as well. This is mainly a backwards compatibility problem (FORTRAN equivalences, etc.) It is infrequent in newer code, mainly appearing in things like packed data structures in communication protocols. -- Tim Olson Advanced Micro Devices (tim@amd.com)
lamaster@ames.arc.nasa.gov (Hugh LaMaster) (03/18/89)
In article <24889@amdcad.AMD.COM> tim@amd.com (Tim Olson) writes: >In article <1989Mar16.190043.23227@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >| In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >Also, auto-incrementing addressing modes imply: > - Another adder (to increment the address register in parallel) > - Another writeback port to the register file >Unless you wish to sequence the instruction over multiple cycles :-( >I'm certain that most people can find something better to do with these >resources than auto-increment. I neither agree nor disagree with this. But, I think it should be noted that auto-increment/decrement addressing modes can easily be generated by compilers and are parallelizable in hardware, and are therefore potential performance wins, although in practice it may not work out. I am sure people have simulated these questions to death, and examined the various possibilities for code sequences. You can increment on compare and branch also (e.g. IBM BXLE). Can you fill the delay slot in a branch if you have already incremented, etc? Detailed simulations using a lot of different kinds of source code are needed to determine questions like this. Anyway, this is a different situation from the alignment problem below, since the performance loss for doing unaligned data accesses is significant, the hardware designers tell us. Anyway, it is a separate performance hit from the usual RISC/CISC issues. >| As for hardware handling of unaligned data, this is purely a concession ************** The reason that the VAX (and a few other) architectures are hard to pipeline is that the operand specifiers require a separate decode, and that a variable number of operands may come from memory, not because the machine has autoincrement/decrement addressing modes. But, really the issue is not "complexity" (usually in the eye of the beholder anyway) but ease of pipelining (a lot easier to measure). The VAX (always the straw man in any RISC debate) achieves its design goals of: " 1) all instructions should have the 'natural' number of operands and 2) all operands should have the same generality in specification. " (see Strecker's paper in Sieworek, Bell, and Newell). It just so happened that these design goals, which produce a small number of very compact instructions (and thus overcome the problem of "most architectures" as stated in the paper) for a given piece of source code, were the wrong goals to pursue if another goal is PERFORMANCE. OK, so they bet wrong on the VAX...they bet that instruction compactness was very important. Almost immediately, they began to be proved wrong. On the other hand, ten years years, and billions of dollars of sales went by before the noise got to be too loud, so, ... Hugh LaMaster, m/s 233-9, UUCP ames!lamaster NASA Ames Research Center ARPA lamaster@ames.arc.nasa.gov Moffett Field, CA 94035 Phone: (415)694-6117
csimmons@oracle.com (Charles Simmons) (03/18/89)
In article <1989Mar16.190043.23227@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <37196@bbn.COM> slackey@BBN.COM (Stan Lackey) writes: >>The trend in computer evolution is truly toward greater hardware >>complexity. This has been demonstrated countless times. The >>reversion back to too much simplicity did happen in the late 70's, but >>here we are again, back on the same curve... > >Except, this time the complexity added will be *useful* complexity, with >any luck. No, we are not headed back towards CISC. I'd have to agree with Henry completely. One possible alternative way of looking at this is that CISC technology, especially the VAX, was developed during a period of time when a lot of programs were written in assembler. If you look carefully at the MIPS architechture and the output of its C compiler, you'll soon discover that on a MIPS machine, there is absolutely no reason to program in assembler. Since one of the fundamental tenets of RISC design is that all programs will be written in a high-level language, we aren't going to see instructions sets that are real complicated for the simple fact that the compilers can't deal with the complexity. -- Chuck
rpw3@amdcad.AMD.COM (Rob Warnock) (03/21/89)
In article <28200290@mcdurb> aglew@mcdurb.Urbana.Gould.COM writes: +--------------- | Bravo! Who needs vectored interrupts? | How often does your device know better where to interrupt to than you do? +--------------- When I first began designing with the Am29000, at first all my old habits felt cramped at "only" 4 levels of external interrupt, which don't even read a vector from the interrupting device. But I quickly realized that since the 29k has a "count-leading-zeroes" (CLZ) instruction, all you need is a magic external location you can read (can you spell 74F374?) which gives you one bit per interrupting device, and an inclusive-OR to your single interrupt line. (Who needs 4 of them, anyway?) So you load the bits, CLZ, add a table base, and jump... Given slow 8-bit I/O chips, that takes a lot less time than a vector fetch. +--------------- | But... how can interrupt (not exception) handling be made better/worse? | As an erstwhile systems programmer in a real-time OS, I know that we often | wished that interrupts could be treated exactly like processes, | going through the same priority or deadline driven scheduler. | Yet applying RISC principles to the hardware that would be needed to do | something like this, I often arrive at the conclusion that a | simple single entry point first level handler is all that is appropriate. | Everything else seems to need sequencing. +--------------- I agree. [Tutorial alert. Many of you know this already. But it's worth saying once or twice a decade, and I haven't heard it lately, so here goes...] As has been done by many of us on a variety of machines, a useful interrupt software "style" (good on many CISCs as well as RISCs) seems to be to split interrupt handlers into a "first-level"/hardware-oriented/assembly-language section, and a "second-level"/software-oriented/C-language part, with the following characteristics: - You leave the "real" hardware interrupts always enabled (especially during 2nd-level handlers, system calls, etc.). - When an interrupt occurs, all you do is clear the interrupting hardware, grab whatever really volatile data there might be, and queue up the 2nd-level handler to run -- if it's really needed ("soft"-DMA can often just stash the data in a buffer and dismiss). If there's already a 2nd-level handler running at the same or higher *2nd-level* priority [see below], you just queue up a task block, and IRET. The trick is that the *hardware* interrupt is disabled only for the brief moment when a 1st-level handler is running. - The Unix "spl??()" [Set Priority Level] routines are modified to manipulate a *software* notion of priority, which is respected by the 2nd-level routines and system-call level code (but not the hardware), and never turn off the hardware enables. Necessary exclusion with 1st-level handlers is done with *very* short interrupt disable periods, or none at all. (Treating the 1-st level handlers like "DMA devices", you can usually find a way to eliminate the IOFFs). - The interface between 1st- & 2nd-level sections is a little "task queue", sort of a light-weight "real-time scheduler". You can have a one, or any number of interrupt task queues, not necessarily related at all to whatever hardware priorities you are stuck with. - Once you start running a 2nd-level routine, you continue taking tasks off the 2nd-level queue(s) until they are empty, before restoring the CPU state and dismissing. (Since hardware*interrupts are still on, it is quite possible that more than one 2nd-level routine gets run per CPU state save.) - If you *can* get by with just one 2nd-level priority, do so. It avoids the extra state saving that comes with preempting multi-level priorities. (I know, sometimes you can't avoid it. But sometimes you can. On one system we just used the Unix "callout" queue, just setting a zero delay time if the task was for an interrupt.) The advantages of this style are these: 1. Since hardware interrupts are never turned off for long, input data overruns are easy to avoid. (...unlike some Unixes which turn off the world whenever they are searching the buffer cache!!! No wonder so many people think Unix can't do 19200 baud input. At the same time, you save a some hardware cost, since the need for real DMA hardware is lessened.) 2. The 1st-level tasks can usually be done in a few assembly instructions without saving very much CPU state; the 2nd-level tasks need a full C context, reentrant and "interruptable" -- a lot more state. Since interrupts are often "bursty", the two-level structure saves state *once* for several interrupts, a significant efficiency gain. In fact, interrupt handling gets more efficient the higher the interrupt rate. 3. Most interrupts from "character" devices can be handled entirely in the 1st-level handlers as "soft-DMA", or "pseudo-DMA", thus lessening further the number of full CPU state saves done. 4. Since hardware and software priorities now have nothing to do with each other, you can allocate priorities more rationally. For example, you may have a multi-line serial card which has one interrupt level for all the transmitters and receivers on the card; also in the system is a disk. In this case, the 1-st level serial-I/O handler will probably want to queue input (received) data to be processed at a *higher* 2nd-level priority than the disk, but queue output (transmit done) interrupts at a *lower* priority than the disk. Applying the above to a Version 7 Unix port to a 5.5 MHz 68000 (years ago), we were able to take a system which could hardly do a single 2400-baud UUCP and get it to cheerfully handle three simultaneous 9600-baud UUCPs! ...and with no change to the hardware: interrupt-per-character SIO chips. [Note: When the 29000 takes an interrupt, volatile state (PC, PS) is "frozen" in backup or shadow registers in the CPU, and execution continues (with some slight restrictions). An "IRET" restores the running process's state from the shadow registers. Instructions exist to read/write the shadow registers if a full save/restore is to be done. The very-light-weight "freeze mode" interrupt matches very nicely with the above interrupt software style. You dedicate a few protected global registers to freeze-mode processing, and *no* state has to be explicitly saved/restored unless a 2nd-level handler needs to be started in a full "C" context.] Rob Warnock Systems Architecture Consultant UUCP: {amdcad,fortune,sun}!redwood!rpw3 ATTmail: !rpw3 DDD: (415)572-2607 USPS: 627 26th Ave, San Mateo, CA 94403
w-colinp@microsoft.UUCP (Colin Plumb) (03/21/89)
tim@amd.com (Tim Olson) wrote: > Also, auto-incrementing addressing modes imply: > > - Another adder (to increment the address register in parallel) > - Another writeback port to the register file Another adder? Most RISC chips use base+offset addressing; all you need is the ability to send the result back to the base register as well as to the address bus. This is almost always possible for stores, and may be possible for loads, since the result of the load generally comes in significantly later then the address goes out. (The 29000 uses this cycle to store back the result of the previous load, which had been waiting in a scoreboard register, but other schemes may do something else.) In my dream chip, I added postincrement by latching the address from the ALU input bus, and was happy. -- -Colin (uunet!microsoft!w-colinp) "Don't listen to me. I never do." - The Doctor
tim@crackle.amd.com (Tim Olson) (03/22/89)
In article <12@microsoft.UUCP> w-colinp@microsoft.uucp (Colin Plumb) writes: | tim@amd.com (Tim Olson) wrote: | > Also, auto-incrementing addressing modes imply: | > | > - Another adder (to increment the address register in parallel) | > - Another writeback port to the register file | | Another adder? Most RISC chips use base+offset addressing; all you need is | the ability to send the result back to the base register as well as to | the address bus. I must have had my architectural blinders on that day -- others have pointed this out to me as well. I was thinking about the other adder requirement because the offset is typically used to supply a constant offset from the current "frame pointer" [be it an actual register or an adjustment from the stack pointer] for local array accesses. However, this need not be the case -- it can be folded in to the base register at the top of the loop (like the Am29000 does) and the offset field can be used as the increment specifier. -- Tim Olson Advanced Micro Devices (tim@amd.com)
dam@mtgzz.att.com (d.a.morano) (03/23/89)
In article <24929@amdcad.AMD.COM>, rpw3@amdcad.AMD.COM (Rob Warnock) writes: > > [Tutorial alert. Many of you know this already. But it's worth saying once > or twice a decade, and I haven't heard it lately, so here goes...] > > As has been done by many of us on a variety of machines, a useful interrupt > software "style" (good on many CISCs as well as RISCs) seems to be to split > interrupt handlers into a "first-level"/hardware-oriented/assembly-language > section, and a "second-level"/software-oriented/C-language part, with the > following characteristics: > > [many characteristics of the above style deleted] As you probably know, you have described in essence exactly what DEC did for their RSX-11M and VMS operating systems 15 or so years ago. DEC calls their second level handlers 'fork processes'. These second level fork processes could execute partially in true hardware interrupt time and partially in scheduled light weight process time after the hardware interrupt has been dismissed. The amount of time spent in either mode is programmed in the fork process by using dispatching and light weight scheduling primitives. This style, as you have called it, does have the benefits that you have sited. This approach to interrupt handling better positions these OSes for hardware oriented time critical applications. Of course, DEC would code both portions of their handlers in assembly. Dave Morano AT&T Bell Laboratories