cliff@centaure.UUCP (Clifford Dibble) (05/05/89)
Unauthorized reprint from the Sept 1988 issue of "Electronic System Design" magazine: Single Instruction Set Computer A recent trend in computer architecture, especially for microprocessor implementations, is the Reduced Instruction Set Computer (RISC). RISCs are characterized by a small number of simple instructions that typically execute in a single cycle. By combining this concept with a large, high-speed register file, RISC proponents have produces many machines that outperform their complex (CISC) brethren. The SISC extends the concept of RISC architecture to the fullest degree. Basically, the SISC implements single, yet extremely powerful instruction. The result is a flexible, low-cost processor that outperforms many designs containing tens of thousands more transistors. Since there is only a single instruction, an order-of-magnitude reduction in processor complexity is achieved. The SISC operates with no instruction pipeline and no instruction cache. These elements, which add cost and complexity to other processors, are entirely unnecessary on the SISC: The "next" instruction is always the same as the previous one. There is no neeed to fetch an opcode, and no need to decode one. Every cycle is a execution on the SISC. And with no opcodes to fetch, there is also no need for an instruction register or a program counter, thus further simplifying the design. The elegance of the SISC processor is embedded in its single multipurpose instruction: INC A. This instruction, the only one available on the SISC, adds one to the contents of the accumulator and stores the result in the accumulator. The value of this approach becomes apparent when one considers that both operands are implied by the instruction itself, as is the destination. Consequently, no memory cycle is required. Ever. This leads to the surprising result that the SISC can operate with no memory at all, a conclusion we verified experimentally. The savings in memory management circuitry, RAM control, and memory devices themselves are substantial. It may be the second-biggest advantage SISC holds over other, more traditional designs. By far the first advantage is the elimination of software. Most new processors suffer in the marketplace because of the initial lack of programming tools and utilities. But the SISC, with only one instruction, requires no software. INC A, INC A, INC A. That's all there is to it! A traditionalist may question the value of a processor with no memory, no software, and only one instruction. But we have verified, at least statistically, that the SISC can produce any result that any other computer can produce. And it usually does it faster. In one test of the SISC's capabilities, an array of SISC processors was used to drive a 1024x1024 raster graphics display. Each SISC was wired to a single pixel; the result held in each SISC's accumulator selected a pixel's color and luminance parameters. When the SISCs were fired up, the display produced a dazzling array of images: a frowning Mona Lisa, a picture of what Gorbachev is doing *right now*, and the complete set of blueprints for the Stealth bomber (along with several decoys). So in addition to possible applications in the arts, the SISC may have national security applications. Other, more mundane, applications include an odometer for automobiles, and tracking the national debt. The current generation of SISC processors is fabricated of germanium PNP transistors in TO-5 cans. Samples are available now, with volume shipments beginning April 1.
les@unicads.UUCP (Les Milash) (05/06/89)
In article <112@centaure.UUCP> cliff@centaure.UUCP (Clifford Dibble) writes: >Single Instruction Set Computer >The SISC extends the concept of RISC architecture to the fullest degree In Dan Hillis's book about the connection machine he calls it "the ultimate RISC" cause it has 1 (albeit very powerful) instruction. and he's not joking; it's sort of a 16 dimentional hypercube of soft-settable pals-with-sram; each instruction tells the (1-bit) alus what to do and who to do it to. actually, y'all might enjoy reading this book--only take an evening--even tho it's an SIMD and we're all into MI?Ds apparently. the machine is really radical, but nevertheless one of the languages (C*) is amazingly normal considering this is a massively parallel SIMD. the guy always thinks in "the limit as N -> infinity"; it's that perspective that kind of turned me off to shared memory machines or to busses (i mean isn't it basically true that lim (N->oo) {sharing memory} = starvation? (i'm now shudder in fear and donning my asbestos panties cause i realize that that's probably a very controversial thing to say (in fact i'd rather y'all'd just call back and call me a Sh*thead in all caps rather than have us do a big war about it)) but ain't that basically the truth? shared memory works as long as you don't try to share it (that's what snoopy caches are hoping for?); message passing works as long as you don't need to pass many; all these approaches we can milk for another order of magnitude or two but basically the problem is very difficult? right? the book re-inspires me that "there are other Very Odd architectures out there waiting to be discovered; some of which are Very Useful". After all, this is the age of Very Unusual Architectured Computers, right? sigh. too bad some naive nerd like myself can't just invent The Ultimate Computer and get rich and famous. our problems are difficult problems. after reading the VLIW/Superscalar/Superpipelined article in ASPLOS III i thought that there are so many tradeoffs, this'd never be easy. y'all do pretty well, what we have is amazingly fast. (sorry to blab on for so long) Les Milash
dean@mars.Berkeley.EDU (R. Drew Dean) (05/06/89)
While this subject started out as a joke, I point the net back to the discussion a few months ago about a _real_ one instruction CPU: Subtract & Branch Negative. The instruction looks like SUBN source1, source2, next source1 <- source1 - source2; if(source1 < 0) PC <- next else PC <- PC + 12; Others have shown that this is Turing equivalent. It would seem that generating optimal code for this machine would be easy -- just generate the shortest sequence of instructions possible -- all (1) instruction(s) take the same amount of time, (I'd hope you'd pipeline it (easy) to get it down to 1 CPI), so the code generator doesn't have to worry about much. Of course, to really make this thing scream, it needs to run at about 300 MHz, and have a _lot_ of 3 ns memory...:-) You might want to try microcoding a RISCy instruction set on it, but it would be memory-memory, as the chip has no registers other than the PC. I remember someone (sorry, I forget who) on the net saying that they had started to write a Pascal compiler for this beast.... Drew Dean Internet: dean@xcssun.berkeley.edu UUCP: ...!ucbvax!xcssun!dean FROM Disclaimers IMPORT StandardDisclaimer;
seibel@cgl.ucsf.edu (George Seibel) (05/06/89)
In article <422@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: >In article <112@centaure.UUCP> cliff@centaure.UUCP (Clifford Dibble) writes: >>Single Instruction Set Computer >>The SISC extends the concept of RISC architecture to the fullest degree > >In Dan Hillis's book about the connection machine he calls it "the ultimate >RISC" cause it has 1 (albeit very powerful) instruction. and he's not joking; >it's sort of a 16 dimentional hypercube of soft-settable pals-with-sram; >each instruction tells the (1-bit) alus what to do and who to do it to. > >actually, y'all might enjoy reading this book--only take an evening--even >tho it's an SIMD and we're all into MI?Ds apparently. >the machine is really radical, but nevertheless one of the languages (C*) >is amazingly normal considering this is a massively parallel SIMD. Is it *really* radical? I'm not so sure - it's different than the way we do things now, but I think we'll be seeing more and more of it. By the way, speaking of amazingly normal languages, the thing has a FORTRAN (!) compiler now. I've seen code for it, there are some mild extensions, but I saw nothing wierd. >the guy always thinks in "the limit as N -> infinity"; it's that perspective >that kind of turned me off to shared memory machines or to busses (i mean >isn't it basically true that lim (N->oo) {sharing memory} = starvation? >(i'm now shudder in fear and donning my asbestos panties cause i realize >that that's probably a very controversial thing to say (in fact i'd rather >y'all'd just call back and call me a Sh*thead in all caps rather than have us >do a big war about it)) but ain't that basically the truth? shared memory >works as long as you don't try to share it (that's what snoopy caches >are hoping for?); message passing works as long as you don't need to pass >many; all these approaches we can milk for another order of magnitude or >two but basically the problem is very difficult? right? Well, I think you've hit a certain nail on the head (the bandwidth problem) but note that the CM is not a shared memory machine; that's the point of it. Each processor has a small amount of memory local to it, sort of like you stirred the cpus up with the main memory. This way you don't have to push either the cpu or memory technology, and you can still have a nice match between memory bandwidth and cpu power. It's going to be a lot easier to keep thousands of 1 mips processors fed using thousands of parcels of slow memory than it will be to keep four 125 mips ECL RISCS fed from a single chunk of shared memory. This has been mentioned in one way or another approximately a billion times recently in this newsgroup. >the book re-inspires me that "there are other Very Odd architectures out >there waiting to be discovered; some of which are Very Useful". After all, >this is the age of Very Unusual Architectured Computers, right? Hmmm... (putting on my cynic hat) I'm not so sure. Just try to sell one. Seems like if anything the industry is getting more conservative. Markets are certainly driven by existing software to an almost unhealthy extent, and this is an influence that works against unusual architectures. This is not to say that a really good idea can't make it; it had just better be Really Good, and you'd better have deep pockets to ride out the long wait until it catches on. I happen to think the CM is a Good Idea, and I second your suggestion that people check out Hillis' book. George Seibel, UCSF
bradb@ai.toronto.edu (Brad Brown) (05/06/89)
In article <11579@cgl.ucsf.EDU> seibel@cgl.ucsf.edu. (George Seibel) writes: >In article <422@unicads.UUCP> les@unicads.UUCP (Les Milash) writes: >>the book re-inspires me that "there are other Very Odd architectures out >>there waiting to be discovered; some of which are Very Useful". After all, >>this is the age of Very Unusual Architectured Computers, right? > >Hmmm... (putting on my cynic hat) I'm not so sure. Just try to sell one. >Seems like if anything the industry is getting more conservative. Markets >are certainly driven by existing software to an almost unhealthy extent, >and this is an influence that works against unusual architectures. This >is not to say that a really good idea can't make it; it had just better >be Really Good, and you'd better have deep pockets to ride out the long >wait until it catches on. I think Les is right, but I can't see a good way of getting over the marketing problem. I just finished taking a course in advanced computer architecture, and we looked at a lot of old machines that had all kinds of really neat features that you just don't see any more. My favorite was the Buroughs 6600 (?) and it's segmentation scheme. You could do all kinds of dynamic memory allocation basically for free, virtual memory was a side effect of the system, and a lot of access bugs could be detected at runtime with no performance penalty. Unfortunately, the machine was heavily oriented towards languages that could make use of these concepts, like Algol or PL/1 -- I guess Pascal and Modula-II (probably Ada) could make use of it now. Unfortunately, a machine like this can't run C very well, 'cause C presupposes a flat, uniform memory space. And if you can't run C, you can't sell a new computer in today's market. Furthermore, machines like this need a lot more hardware, increasing the cost. With minimalist RISC machines lowering the cost of performance (and being ideal for running C programs) it would be hard to justify a machine like this. I think the hardest thing to do is to build a machine that is going to provide some really good support for programming and at the same time be able to run a lot of differnt languages. I see language-specific machines like the Symbolics LISP machines ultimately failing in the market because they are too specialized, and the non-specialized machines catch up to them in price-performance too quickly... (-: Brad Brown :-) bradb@ai.toronto.edu
nelson@berlioz (Ted Nelson) (05/06/89)
I am fascinated by the entire concept of a single instruction computer, and I feel it is possible that this idea will make it to market as a extremely low-cost general-purpose processor. Of course, an entire generation of software tools will have to be rethought; for one, self-modifying code will become a much more powerful (necessary?) method. But the memory dependence is extremely high. The van der Poel instruction requires 3 operand fetches, 2 data reads, and one data write. Assuming that these cannot take place concurrently, that we have a system based on 100ns memory, and ignoring all other factors, each instruction takes 600ns. This instruction rate is about equivalent to a 12 Mhz 68000, but each instruction is considerably less powerful. First idea: Since the operand fetches are in adjacent words, we can fetch them at the same time using triple-interleaved memory (this will require a bit more logic than typical interleaving) and three separate buses on the processor -- which is no problem since they are independent. We could also take care of the data reads in the same way by putting a (severe?) restriction on the software (a la RISC "let the compiler deal with it") that operands cannot be of the same modulus 3. So using this idea, we get each instruction's memory access time down to 300 ns -- twice the throughput. Second obvious idea: Pipeline the sucker. I only have a basic understanding of pipelines, but it seems to me that a straight three or four stage pipe cannot work because of the memory conflict -- the fetch (F), read (R), and write (W) stages cannot operate concurrently. So let me propose two more stages: Computation (C) {essentially the subtract} and Branch (B) computation based on the condition code (the only condition code, Negative). The stages operate FRCWB, and in operation will be as follows: F R C W B F R C W B F R C W B As you can see, we still have a memory conflict between the Write of the "current" instruction and the Read of the next instruction. My first reaction was to add another software restriction in that the Write and the two Reads had to have addresses of different modulus 3. But I think that this is too severe and renders it unusable -- this is too much for the compiler to handle. Or is it? Can anyone come up with a better pipelining scheme? Or anyway of improving the performance? Keep in mind that the market for this is as a very low cost processor, so the problem cannot be solved by using dual-port RAM. Unless, of course, dual-port RAM drops considerably in price. Or we could use National Semiconductor's new memory product: 1 Megabit Write-Only Memory (WOM). This is extremely inexpensive, has an access time of only 10 ns, and will be available in a dual-port version in only a few months. If you wish to order any of this great part, pleast contact me directly -- it is such a secret project that we haven't let Marketing in on it yet. -- Ted. "When comes The Revolution, things will be different! Not better. Just different."
albaugh@dms.UUCP (Mike Albaugh) (05/09/89)
From article <13359@pasteur.Berkeley.EDU>, by dean@mars.Berkeley.EDU (R. Drew Dean): > While this subject started out as a joke, I point the net back to the > discussion a few months ago about a _real_ one instruction CPU: > Subtract & Branch Negative. The instruction looks like > SUBN source1, source2, next > [detail elided at least three such machines exist, mine is one-address] > optimal code for this machine would be easy -- just generate the shortest > sequence of instructions possible -- all (1) instruction(s) take the same > amount of time, (I'd hope you'd pipeline it (easy) to get it down to 1 CPI), ^^^^ > so the code generator doesn't have to worry about much. Not so fast, there... running "traditional" sorts of programs on such a beast relies heavily on self-modifying code. The problem of forwarding gets "interesting". Of course, I suppose the sort of scheduling that makes VLIWs work could make sure there were no dependancies in the pipe... (Hmmm. maybe I should dust of those plans... :-) > Of course, to really make this thing scream, it needs to run at about 300 MHz, > and have a _lot_ of 3 ns memory...:-) You might want to try microcoding a Yes, that turn out to be a key problem for a hobbiest, even at a mere 20 MHz. > RISCy instruction set on it, but it would be memory-memory, as the chip > has no registers other than the PC. John Bown took the approach of "emulating" the PDP-11 instruction set via macro-expansion. This was helped by having a small amount of higher speed memory for the "registers". > I remember someone (sorry, I forget who) on the net saying that they had > started to write a Pascal compiler for this beast.... One poster had a working VanDerPoels "Zebra", with a Pascal Subset cross-compiler. I was the one who had "started" to write a small-C compiler, intending native compilation. No time, no money.... > Drew Dean > Internet: dean@xcssun.berkeley.edu > UUCP: ...!ucbvax!xcssun!dean > FROM Disclaimers IMPORT StandardDisclaimer; | Mike Albaugh (albaugh@dms.UUCP || {...decwrl!turtlevax!}weitek!dms!albaugh) | Atari Games Corp (Arcade Games, no relation to the makers of the ST) | 675 Sycamore Dr. Milpitas, CA 95035 voice: (408)434-1709 | The opinions expressed are my own (Boy, are they ever)