[comp.arch] OISC was Re: ZISC computers

albaugh@dms.UUCP (Mike Albaugh) (11/18/88)

	I wanted to point out that a machine with no opcode has not
zero, but one instruction(s). (2**0 = 1)

	As several people have noted, the idea of using addresses to
"cheat" (refer to function units) is well represented in the literature.
A recent article in ACM Computer Architecture News even has the audacity
to call this approach "The Ultimate RISC". I might be persuaded that it
is the "ultimate _useful_ RISC", but machines such as Van der Poel's
(article <3562@hubcap.UUCP>, by mark@hubcap.UUCP (Mark Smotherman) )
are arguably much more RISCy.

	A few years ago (Before I was personally on UseNet), I had a
friend email (Maybe post?) an earlier version of the following, In
which I present my own "ultimate" RISC. Since mine also has one (or two,
but not several) "magic" location, it may not be truly "ultimate". I
feel that it is closer than the various two and three address variations.
I originally intended to actually wire-wrap the first version, and may
yet build the current version, but kids and a house have changed my
priorities. I do have a C simulator, and some useful macros.

By the way, the GRI-??? mentioned in <20743@apple.Apple.COM> by 
baum@apple.com was manufactured by George Risk Industries, so it may
have been ahead of its time in nomenclature too :-)

And now, a re-run:

Random thoughts on the Ultimate Reduced Instruction Set Computer

	One possible implementation is a machine which has the instruction:
Reverse Subtract and skip on Borrow. That is, every instruction has no opcode
but does have an address. The accumulator is subtracted from that address and
the result left both there and in the accumulator. If the subtraction caused
a borrow (i.e. no carry out of the ALU), the subsequent instruction is skipped.
The only handwaving required is for the PC to live in a fixed location
(say 0, ala PDP-5). A very naive control unit which does one instruction per
6 clock cycles can be implemented, a slightly less naive one can do it in
5 (sometimes 4), and shadowing the PC can get it to 3 (sometimes 2).

	Some macros for the machine, to implement more common instructions are:

	macro CLR p1
	p1			; p1,acc <- p1-acc (may skip)
	p1			; p1,acc <- 0 (cannot skip)
	p1			; p1,acc <- 0 (cannot skip)
	ENDM
;	The third instruction is not redundant, since we have no way of knowing
; whether the first will skip.

	macro LOAD p1 p2
	CLR p2			; We need to provide a sacrificial location
	p1			; acc,p1 = p1-0 (cannot skip)
	ENDM

	macro STORE p1 z	; z must be a pre-zeroed location
	z			; acc,z = -acc (will skip unless acc=0)
	z			; acc,z = 0 (a nop in this case)
	CLR p1
	z			; acc = z (-acc from entry, cannot skip)
	p1			; acc,p1 = entry acc (will skip unless 0)
	z			; another NOP to fill the skip safely.
	ENDM

;	The necessity of providing a sacrificial location for LOAD and
; a guaranteed zero (which is destroyed) for STORE imply that the
; "one-address" machine is an inappropriate paradigm. The "two-address"
; machine yields a substantial improvement:

	macro MOV p1 p2 z
	CLR p2
	z
	z			; Also clears z, but shorter 'cuz we can't skip
	p1			; acc,p1 = p1
	z			; acc,z = -p1		(will skip unless 0)
	z			; in case p1 was 0
	p2			; acc,p2 = p1		(will skip unless 0)
	z			; in case p1 was 0
	ENDM

	macro SUB p1 p2 p3	; (p3=p1-p2), (p2=p2-p1)
	CLR p3
	p1			; acc,p1 = p1 (cannot skip)
	p2			; acc,p2 = p2-p1 (may skip)
	p3			; acc,p3 = p1-p2 (will skip unless p1=p2)
	p3			; acc,p3 = p1-p2 (cannot skip)
	ENDM

The analysis of that one is a bit tricky. If p2 is less than p1, the p2
instruction skips to the second p3, which cannot skip again, since 
(p2<p1) IFF (p1>p2). If p2 is greater than p1, the p2 will not skip, but the
first p3 will, by similar reasoning. If p2=p1, all instructions are executed
and p2 = p3 = 0.

	macro JMP l
	CLR	Z
	lit	next-l		;acc=(next-l)
next	0			;pc (which is in 0) = next-(next-l) = l
	ENDM

Actually, the value of the literal needs to be adjusted for the skip which
will occur if l is at a lower address than next, but I didn't want to clutter
the psuedo-macro with conditionals.

	macro ZAJMP l z		; if acc=0		if acc<>0
	z			; 0 (noskip)		-acc (must skip)
	lit	l-next		; l-next (noskip)	----
	z			; next-l (mustskip)	0 (noskip)
	z			; -----			0 (nop)
next	0			; JMP l			nop
	ENDM

	macro ZJMP t l z	;Jump to l if t=0
	LOAD	t z
	ZAJMP	l z
	ENDM

    The "lit" function is similar to the # (immediate) operator on such
machines as have an immediate mode. If you don't have such a mode (many
early machines didn't), you typically create a "literal pool", for all
the constants you are going to use. Such constants are shared by all the
instructions that need to refer to that particular bit pattern. Since a
careless use could alter the pattern, this can be a source of endless
"amusement" during debuging. The creation of a literal pool on a macro
assembler that doesn't have such a feature (such as DEC MACRO-11, because it
wouldn't use it in its "native tongue") is left as an exercise to the reader,
for now. I know how _I_ do it, but will leave it as an exercise to the
reader for any particular machine.

	Robert Murdoch (of Intel, Hi Doch :-)) suggested a refinement of 
Lyle Rains (of Atari Games, my boss's boss) idea that byte addressability
of memory would make character operations easier. Doch's idea was to go all
the way and make the memory bit-addressable. A whole word is still
referenced and modified in each cycle, but that word can start on any bit
boundary. Since the extra 4 bits would take a real bite out of the address
space of a 16-bit machine, this makes most sense with a 32 bit machine. The
complexities of the hardware get intense (if we want to go very fast), but
right shifts and field extraction become trivial.

	Interrupts can be handled by forcing references to 0 to go to "1"
instead. The quotes around "1" are because a bit addressable machine would
have its second word at address 16 (or 32). Anyway, just oring the interrupt
control flag with the appropriate address bit on a reference to 0 will do the
trick. The following software is needed to save and restore the "processor
status". accs and accs2 are two pre-zeroed locations, to recieve the contents
of the acc, whether or not the instruction before the interrupt caused a skip.
accr and accr2 pre-set by the IRQ routine, and the return is before the entry
so a simple delay on the reset of the IRQ flag accomplishes return
There are three possible input states to re-construct:
			 acc==0, !skip;	acc<>0, !skip;	acc<>0, skip

IRQRTN:	clr	IFRESET	; Flag hardware, action delayed 2 cycles past last ref.
	accr			0	0		-inacc
	accr2			0 	inacc 		0->inacc skip

; These get set up by:
INTEN:	accs		;	0	-inacc s	0
	accs2		;	0	---		-inacc s	
	accr		;	???	???		---
	accr		;	0	--?		???
	accr		;	0	0		--?
	accr		;	0	0		0
	accr2		;	???	???		???
	accr2		;	0	0		0
	accs		;	0	-inacc		0
	accr2		;	0	inacc s		0
	accs2		;	0	---		-innac
	accs		;	0	0		inacc s
	accr		;	0	0		---
	accr		;	0	0		-inacc s
	accs		;	0	0		---
	accs		;	0	0		0
	accs2		;	0	0		innac
	accs2		;	0	0		0
; At this point, accs and accs2 are pre-zeroed for next IRQ, accr and accr2
; are set up for return. Do body of IRQ routine...

	jmp	IRQRTN	; Jump up to return point

; for 28 cycles interrupt overhead...
; The clear of accs2 could be merged with the jump to IRQRTN to save two cycles

Isn't Science wonderful...

| Mike Albaugh (albaugh@dms.UUCP || {...decwrl!turtlevax!}weitek!dms!albaugh)
| Atari Games Corp (Arcade Games, no relation to the makers of the ST)
| 675 Sycamore Dr. Milpitas, CA 95035		voice: (408)434-1709
| The opinions expressed are my own (Boy, are they ever)

andrew@jung.harlqn.uucp (Andrew Watson) (11/27/88)

In article <579@dms.UUCP> albaugh@dms.UUCP (Mike Albaugh) writes:

   One possible implementation is a machine which has the instruction:
  Reverse Subtract and skip on Borrow. That is, every instruction has no opcode
  but does have an address. The accumulator is subtracted from that address and
  the result left both there and in the accumulator. If the subtraction caused
  a borrow (i.e. no carry out of the ALU), the subsequent instruction is skipped.
  The only handwaving required is for the PC to live in a fixed location
  (say 0, ala PDP-5). A very naive control unit which does one instruction per
  6 clock cycles can be implemented, a slightly less naive one can do it in
  5 (sometimes 4), and shadowing the PC can get it to 3 (sometimes 2).

  [ ... lots of interesting detail deleted ... ]

I heard that an undergraduate at Manchester University (UK) built something
very much like this as a final-year project. He also wrote macros to emulate
the 8086 instruction set, and as a result benchmarked his contraption as faster
than a stock IBM PC.

Would any Manchunians (Mario? Ifor?) like to comment?
--
			       Regards,

                                   Andrew.

+-----------------------------------------------------------------------------+
|     Andrew Watson, Harlequin Limited,              andrew@uk.co.harlqn      |
|     Barrington Hall, Barrington,                   Tel: +44 223 872522      |
|     Cambridge CB2 5RG, UK                          Fax: +44 223 872519      |
+-----------------------------------------------------------------------------+

trevor@ux.cs.man.ac.uk (Trevor Hopkins) (11/29/88)

In a previous article Andrew Watson asked someone from Manchester to
comment on the OIC developed here, well since I knew the guy who built
the it I thought I might as well answer this one.  Excuse any mistakes
-- this is the first time I have posted to the net.

Here are the details:

J.R. Bown built a One Instruction Computer as his 3rd year
undergraduate project at the University of Manchester. His report
entitled `The One Instruction Computer' (third year project report No.
17, 1987) claims in its abstract to cover:

	The design and construction of a One Instruction Computer, OIC
	-- the extrapolation to its logical extreme of the philosophy
	of Reduced Instruction Set Computers, RISCs. The provision of
	a user environment for the testing and evaluation of software
	written for the One Instruction Computer Once Built.

The instruction which this machine implemented was a Reverse Subtract
Skip if Borrow, as described in previous postings.

Just skipping through this report I came on a rather nice quote in his
conclusions:

	Much to everyone's surprise, what at first sight appears to be
	a rather mad idea, namely a computer with only one instruction,
	appears in fact to have a great deal of potential as a viable
	machine. 

John actually built that machine and produced a fairly chunky report
on it, which I have in front of me now. I don't know a lot about it so
I`ll skim through the report and try and pick out the main points.

His implementation used a 60ns basic cycle, with the number of cycles
per instruction varying with the type of operands used. He built a
cross assembler for the machine and was able to download programs into
the OIC and upload the results. He also had an emulator program. A
major section of the report gives details of the assembler and emulator.

In the report he gives a number of comparisons of code written for the
OIC and the corresponding versions for a PDP-11 -- the OIC code is 2
to 2.5 times larger. He concludes that optimisation of programs is
extremely important and identifies 3 main types of optimisation:

Functional: putting the emphasis on using subtracts wherever possible.
Self Modifying code: - needed for indirect operations.
Code re-ordering.

He says his implementation was roughly equivalent to a 1 - 1.5 MIP
PDP-11 type machine (whatever that means), but he describes some
modifications which could be made to a future version of the machine
which he says would make it nearly comparable to a 68020. He points
out that the actual hardware for this machine could easily be fitted
into one chip, together with a significant amount of memory.

I hope this is of some use (or at least some interest) to some of you
out there.

	Rhod

 _______________________________________________________________________
| Dept. of Computer Science    JANET:      rmd@uk.ac.man.cs.r5    	|
| The University               USENET: mcvax!ukc!man.cs.r5!rmd    	|
| Manchester M13 9PL           Tel:    +44-61-275 6149             	|
| U.K.                                                             	|
|   ______      							|
| /~      ~\ 	So long and thanks for all the fish			|
| `-':  :`-'    							|
|____;  ;_____________the mushroom project______________________________|

bpendlet@esunix.UUCP (Bob Pendleton) (11/29/88)

Several years ago I also came up with the one instruction computer
that everyone else has also invented. Since I went to a school that
Ivan Sutherland taught at I must assume I picked up the idea as it
floated past, but I really thought I invented it, sigh.

My variation of the one instruction was the Conditional Move From Here
To There And Clear Flags or CMFHTTACF instruction. The idea being that
if any flags are set the instruction is skipped, in all cases the
flags register is cleared. All instructions are two addresses, a
source and a destination. The PC, the flag register, the flags mask
register, and all ALU registers are mapped to memory locations. And,
of course, some operations have the side effect of setting bits in the
flags register.

Using this machine as a basis for argument I "showed" that there were
good economic reasons, the cost of RAM and such, to try to reduce the
length of the addresses in the CMFHTTACF instruction. If your address
space is 2^32 then the CMFHTTACF instruction is 64 bits long.

The easiest way I could think of to reduce the length of the
instruction was to use two address spaces. One very small address
space that only had enough addresses to map the ALU registers, the PC,
and such.  One large address space for memory. And, to add a one bit
direction flag. To tell if the move was from the large address space
to the small or from the small to the large.

This looked suspiciously like the opcode, register index, and address
type of instructions we see in typical computers. Hmmm.

Well, repeated applications of the cheapness principle rather rapidly
resulted in a multilevel machine that looked an awful lot like a
a sort of RISC with microcoded coprocessors. 

This exercise lead me to the conclusion that all modern computers are
economically motivated variations of the CMFHTTACF machine. :-)

The odd things that go through the minds of grad. students while
trying to write a thesis.

		Bob P.

-- 
              Bob Pendleton, speaking only for myself.
UUCP Address:  decwrl!esunix!bpendlet or utah-cs!esunix!bpendlet

		Reality is what you make of it.