[comp.arch] SISC

cliff@centaure.UUCP (Clifford Dibble) (05/05/89)

	   Unauthorized reprint from the Sept 1988 issue of 
		"Electronic System Design" magazine:


Single Instruction Set Computer

A recent trend in computer architecture, especially for microprocessor
implementations, is the Reduced Instruction Set Computer (RISC). RISCs
are characterized by a small number of simple instructions that
typically execute in a single cycle. By combining this concept with a
large, high-speed register file, RISC proponents have produces many
machines that outperform their complex (CISC) brethren.

The SISC extends the concept of RISC architecture to the fullest
degree.  Basically, the SISC implements single, yet extremely powerful
instruction.  The result is a flexible, low-cost processor that
outperforms many designs containing tens of thousands more transistors.
Since there is only a single instruction, an order-of-magnitude
reduction in processor complexity is achieved. The SISC operates with
no instruction pipeline and no instruction cache. These elements, which
add cost and complexity to other processors, are entirely unnecessary
on the SISC: The "next" instruction is always the same as the previous
one. There is no neeed to fetch an opcode, and no need to decode one.
Every cycle is a execution on the SISC. And with no opcodes to fetch,
there is also no need for an instruction register or a program counter,
thus further simplifying the design.

The elegance of the SISC processor is embedded in its single
multipurpose instruction: INC A. This instruction, the only one
available on the SISC, adds one to the contents of the accumulator and
stores the result in the accumulator. The value of this approach
becomes apparent when one considers that both operands are implied by
the instruction itself, as is the destination. Consequently, no memory
cycle is required. Ever.  This leads to the surprising result that the
SISC can operate with no memory at all, a conclusion we verified
experimentally. The savings in memory management circuitry, RAM
control, and memory devices themselves are substantial. It may be the
second-biggest advantage SISC holds over other, more traditional
designs.

By far the first advantage is the elimination of software. Most new
processors suffer in the marketplace because of the initial lack of
programming tools and utilities. But the SISC, with only one
instruction, requires no software. INC A, INC A, INC A. That's all
there is to it!

A traditionalist may question the value of a processor with no memory,
no software, and only one instruction. But we have verified, at least
statistically, that the SISC can produce any result that any other
computer can produce. And it usually does it faster.

In one test of the SISC's capabilities, an array of SISC processors was
used to drive a 1024x1024 raster graphics display. Each SISC was wired
to a single pixel; the result held in each SISC's accumulator selected
a pixel's color and luminance parameters. When the SISCs were fired up,
the display produced a dazzling array of images: a frowning Mona Lisa, a
picture of what Gorbachev is doing *right now*, and the complete set of
blueprints for the Stealth bomber (along with several decoys).  So in
addition to possible applications in the arts, the SISC may have
national security applications. Other, more mundane, applications
include an odometer for automobiles, and tracking the national debt.

The current generation of SISC processors is fabricated of germanium
PNP transistors in TO-5 cans. Samples are available now, with volume
shipments beginning April 1.

les@unicads.UUCP (Les Milash) (05/06/89)

In article <112@centaure.UUCP> cliff@centaure.UUCP (Clifford Dibble) writes:
>Single Instruction Set Computer
>The SISC extends the concept of RISC architecture to the fullest degree

In Dan Hillis's book about the connection machine he calls it "the ultimate
RISC" cause it has 1 (albeit very powerful) instruction. and he's not joking;
it's sort of a 16 dimentional hypercube of soft-settable pals-with-sram;
each instruction tells the (1-bit) alus what to do and who to do it to.

actually, y'all might enjoy reading this book--only take an evening--even
tho it's an SIMD and we're all into MI?Ds apparently.  
the machine is really radical, but nevertheless one of the languages (C*) 
is amazingly normal considering this is a massively parallel SIMD.

the guy always thinks in "the limit as N -> infinity"; it's that perspective
that kind of turned me off to shared memory machines or to busses (i mean
isn't it basically true that lim (N->oo) {sharing memory} = starvation?
(i'm now shudder in fear and donning my asbestos panties cause i realize
that that's probably a very controversial thing to say (in fact i'd rather
y'all'd just call back and call me a Sh*thead in all caps rather than have us
do a big war about it)) but ain't that basically the truth?  shared memory
works as long as you don't try to share it (that's what snoopy caches
are hoping for?); message passing works as long as you don't need to pass 
many; all these approaches we can milk for another order of magnitude or 
two but basically the problem is very difficult?  right?

the book re-inspires me that "there are other Very Odd architectures out 
there waiting to be discovered; some of which are Very Useful".  After all,
this is the age of Very Unusual Architectured Computers, right?

sigh.  too bad some naive nerd like myself can't just invent The Ultimate
Computer and get rich and famous.  our problems are difficult problems.
after reading the VLIW/Superscalar/Superpipelined article in ASPLOS III 
i thought that there are so many tradeoffs, this'd never be easy. y'all
do pretty well, what we have is amazingly fast.

(sorry to blab on for so long)

Les Milash

dean@mars.Berkeley.EDU (R. Drew Dean) (05/06/89)

While this subject started out as a joke, I point the net back to the
discussion a few months ago about a _real_ one instruction CPU:
Subtract & Branch Negative.  The instruction looks like
	SUBN source1, source2, next
source1 <- source1 - source2; if(source1 < 0) PC <- next else PC <- PC + 12;
Others have shown that this is Turing equivalent.  It would seem that generating
optimal code for this machine would be easy -- just generate the shortest
sequence of instructions possible -- all (1) instruction(s) take the same
amount of time, (I'd hope you'd pipeline it (easy) to get it down to 1 CPI),
so the code generator doesn't have to worry about much.

Of course, to really make this thing scream, it needs to run at about 300 MHz,
and have a _lot_ of 3 ns memory...:-)  You might want to try microcoding a
RISCy instruction set on it, but it would be memory-memory, as the chip
has no registers other than the PC.

I remember someone (sorry, I forget who) on the net saying that they had
started to write a Pascal compiler for this beast....


Drew Dean
Internet: dean@xcssun.berkeley.edu
UUCP: ...!ucbvax!xcssun!dean
FROM Disclaimers IMPORT StandardDisclaimer;

seibel@cgl.ucsf.edu (George Seibel) (05/06/89)

In article <422@unicads.UUCP> les@unicads.UUCP (Les Milash) writes:
>In article <112@centaure.UUCP> cliff@centaure.UUCP (Clifford Dibble) writes:
>>Single Instruction Set Computer
>>The SISC extends the concept of RISC architecture to the fullest degree
>
>In Dan Hillis's book about the connection machine he calls it "the ultimate
>RISC" cause it has 1 (albeit very powerful) instruction. and he's not joking;
>it's sort of a 16 dimentional hypercube of soft-settable pals-with-sram;
>each instruction tells the (1-bit) alus what to do and who to do it to.
>
>actually, y'all might enjoy reading this book--only take an evening--even
>tho it's an SIMD and we're all into MI?Ds apparently.  
>the machine is really radical, but nevertheless one of the languages (C*) 
>is amazingly normal considering this is a massively parallel SIMD.

Is it *really* radical?  I'm not so sure - it's different than the way we
do things now, but I think we'll be seeing more and more of it.   By the
way, speaking of amazingly normal languages, the thing has a FORTRAN (!)
compiler now.  I've seen code for it, there are some mild extensions,
but I saw nothing wierd.

>the guy always thinks in "the limit as N -> infinity"; it's that perspective
>that kind of turned me off to shared memory machines or to busses (i mean
>isn't it basically true that lim (N->oo) {sharing memory} = starvation?
>(i'm now shudder in fear and donning my asbestos panties cause i realize
>that that's probably a very controversial thing to say (in fact i'd rather
>y'all'd just call back and call me a Sh*thead in all caps rather than have us
>do a big war about it)) but ain't that basically the truth?  shared memory
>works as long as you don't try to share it (that's what snoopy caches
>are hoping for?); message passing works as long as you don't need to pass 
>many; all these approaches we can milk for another order of magnitude or 
>two but basically the problem is very difficult?  right?

Well, I think you've hit a certain nail on the head (the bandwidth problem)
but note that the CM is not a shared memory machine; that's the point of
it.   Each processor has a small amount of memory local to it, sort of
like you stirred the cpus up with the main memory.  This way you don't
have to push either the cpu or memory technology, and you can still have
a nice match between memory bandwidth and cpu power.  It's going to be
a lot easier to keep thousands of 1 mips processors fed using thousands
of parcels of slow memory than it will be to keep four 125 mips ECL RISCS
fed from a single chunk of shared memory.  This has been mentioned in
one way or another approximately a billion times recently in this newsgroup.

>the book re-inspires me that "there are other Very Odd architectures out 
>there waiting to be discovered; some of which are Very Useful".  After all,
>this is the age of Very Unusual Architectured Computers, right?

Hmmm... (putting on my cynic hat) I'm not so sure.  Just try to sell one.
Seems like if anything the industry is getting more conservative.   Markets
are certainly driven by existing software to an almost unhealthy extent,
and this is an influence that works against unusual architectures.   This
is not to say that a really good idea can't make it; it had just better
be Really Good, and you'd better have deep pockets to ride out the long
wait until it catches on.  I happen to think the CM is a Good Idea, and
I second your suggestion that people check out Hillis' book.

George Seibel, UCSF

bradb@ai.toronto.edu (Brad Brown) (05/06/89)

In article <11579@cgl.ucsf.EDU> seibel@cgl.ucsf.edu. (George Seibel) writes:
>In article <422@unicads.UUCP> les@unicads.UUCP (Les Milash) writes:
>>the book re-inspires me that "there are other Very Odd architectures out 
>>there waiting to be discovered; some of which are Very Useful".  After all,
>>this is the age of Very Unusual Architectured Computers, right?
>
>Hmmm... (putting on my cynic hat) I'm not so sure.  Just try to sell one.
>Seems like if anything the industry is getting more conservative.   Markets
>are certainly driven by existing software to an almost unhealthy extent,
>and this is an influence that works against unusual architectures.   This
>is not to say that a really good idea can't make it; it had just better
>be Really Good, and you'd better have deep pockets to ride out the long
>wait until it catches on.

I think Les is right, but I can't see a good way of getting over the
marketing problem.  I just finished taking a course in advanced computer
architecture, and we looked at a lot of old machines that had all kinds
of really neat features that you just don't see any more.  My favorite
was the Buroughs 6600 (?) and it's segmentation scheme.  You could do
all kinds of dynamic memory allocation basically for free, virtual memory
was a side effect of the system, and a lot of access bugs could be detected
at runtime with no performance penalty.  Unfortunately, the machine was
heavily oriented towards languages that could make use of these concepts,
like Algol or PL/1 -- I guess Pascal and Modula-II (probably Ada) could
make use of it now.

Unfortunately, a machine like this can't run C very well, 'cause C presupposes
a flat, uniform memory space.  And if you can't run C, you can't sell a 
new computer in today's market.  Furthermore, machines like this need a lot
more hardware, increasing the cost.  With minimalist RISC machines lowering
the cost of performance (and being ideal for running C programs) it would
be hard to justify a machine like this.

I think the hardest thing to do is to build a machine that is going to 
provide some really good support for programming and at the same time
be able to run a lot of differnt languages.  I see language-specific
machines like the Symbolics LISP machines ultimately failing in the 
market because they are too specialized, and the non-specialized machines
catch up to them in price-performance too quickly...

					(-:  Brad Brown  :-)
					bradb@ai.toronto.edu

nelson@berlioz (Ted Nelson) (05/06/89)

I am fascinated by the entire concept of a single instruction computer, and
  I feel it is possible that this idea will make it to market as a extremely
  low-cost general-purpose processor.  Of course, an entire generation of
  software tools will have to be rethought;  for one, self-modifying code
  will become a much more powerful (necessary?) method.

But the memory dependence is extremely high.

The van der Poel instruction requires 3 operand fetches, 2 data reads, and
  one data write.  Assuming that these cannot take place concurrently, that
  we have a system based on 100ns memory, and ignoring all other factors,
  each instruction takes 600ns.  This instruction rate is about equivalent
  to a 12 Mhz 68000, but each instruction is considerably less powerful.

First idea:  Since the operand fetches are in adjacent words, we can fetch
  them at the same time using triple-interleaved memory (this will require
  a bit more logic than typical interleaving) and three separate buses
  on the processor -- which is no problem since they are independent.  We
  could also take care of the data reads in the same way by putting a
  (severe?) restriction on the software (a la RISC "let the compiler deal
  with it") that operands cannot be of the same modulus 3.  So using this
  idea, we get each instruction's memory access time down to 300 ns --
  twice the throughput.

Second obvious idea:  Pipeline the sucker.  I only have a basic understanding
  of pipelines, but it seems to me that a straight three or four stage
  pipe cannot work because of the memory conflict -- the fetch (F), read (R),
  and write (W) stages cannot operate concurrently.  So let me propose
  two more stages:  Computation (C) {essentially the subtract} and Branch (B)
  computation based on the condition code (the only condition code, Negative).
  The stages operate FRCWB, and in operation will be as follows:

          F R C W B
              F R C W B
                  F R C W B

As you can see, we still have a memory conflict between the Write of the
  "current" instruction and the Read of the next instruction.  My first
  reaction was to add another software restriction in that the Write and
  the two Reads had to have addresses of different modulus 3.  But I think
  that this is too severe and renders it unusable -- this is too much for
  the compiler to handle.  Or is it?

Can anyone come up with a better pipelining scheme?  Or anyway of improving
  the performance?  Keep in mind that the market for this is as a very
  low cost processor, so the problem cannot be solved by using dual-port
  RAM.  Unless, of course, dual-port RAM drops considerably in price.

Or we could use National Semiconductor's new memory product:  1 Megabit
  Write-Only Memory (WOM).  This is extremely inexpensive, has an access
  time of only 10 ns, and will be available in a dual-port version in only
  a few months.  If you wish to order any of this great part, pleast
  contact me directly -- it is such a secret project that we haven't let
  Marketing in on it yet.

-- Ted.

"When comes The Revolution, things will be different!
    Not better.  Just different."

albaugh@dms.UUCP (Mike Albaugh) (05/09/89)

From article <13359@pasteur.Berkeley.EDU>, by dean@mars.Berkeley.EDU (R. Drew Dean):
> While this subject started out as a joke, I point the net back to the
> discussion a few months ago about a _real_ one instruction CPU:
> Subtract & Branch Negative.  The instruction looks like
> 	SUBN source1, source2, next
> [detail elided at least three such machines exist, mine is one-address]
> optimal code for this machine would be easy -- just generate the shortest
> sequence of instructions possible -- all (1) instruction(s) take the same
> amount of time, (I'd hope you'd pipeline it (easy) to get it down to 1 CPI),
					       ^^^^
> so the code generator doesn't have to worry about much.

	Not so fast, there... running "traditional" sorts of programs on
such a beast relies heavily on self-modifying code. The problem of forwarding
gets "interesting". Of course, I suppose the sort of scheduling that makes
VLIWs work could make sure there were no dependancies in the pipe...
(Hmmm. maybe I should dust of those plans... :-)

> Of course, to really make this thing scream, it needs to run at about 300 MHz,
> and have a _lot_ of 3 ns memory...:-)  You might want to try microcoding a

	Yes, that turn out to be a key problem for a hobbiest, even at
a mere 20 MHz.

> RISCy instruction set on it, but it would be memory-memory, as the chip
> has no registers other than the PC.

	John Bown took the approach of "emulating" the PDP-11 instruction
set via macro-expansion. This was helped by having a small amount of higher
speed memory for the "registers".

> I remember someone (sorry, I forget who) on the net saying that they had
> started to write a Pascal compiler for this beast....

	One poster had a working VanDerPoels "Zebra", with a Pascal Subset
cross-compiler. I was the one who had "started" to write a small-C compiler,
intending native compilation. No time, no money....

> Drew Dean
> Internet: dean@xcssun.berkeley.edu
> UUCP: ...!ucbvax!xcssun!dean
> FROM Disclaimers IMPORT StandardDisclaimer;

| Mike Albaugh (albaugh@dms.UUCP || {...decwrl!turtlevax!}weitek!dms!albaugh)
| Atari Games Corp (Arcade Games, no relation to the makers of the ST)
| 675 Sycamore Dr. Milpitas, CA 95035		voice: (408)434-1709
| The opinions expressed are my own (Boy, are they ever)