[comp.arch] machines with some loadable microcode are easier to fix

gjc@buitc.bu.edu (George J. Carrette) (01/04/91)

The subject line is a good summary, one reason that hardware designers
and vendors prefered machines with loadable microstore (and by the same
token machines that have microcode) was that it made it possible to fix
some "hardware" bugs without replacing chips or boards.

RISC implementations have no microcode, so fixing a hardware bug
means replacing a chip. 

A few months ago I posted to a couple news groups (not this one) a
program that crashed all RISC machines that it had been tried on,
but not the VAX and 68020 machines that I had tried it on. 

Much flaming resulted as expected, since I made the unfair statement
that the system vendors of RISC machines did significantly less
quality of testing than the vendors of CISC machines.

There was also the unsupported statement that RISC machines, with
their rich set of "result is undefined" sequences of instructions were
more difficult to give a high quality of testing and/or engineering
proof of correctness.

Be that as it may, since that time there have been reported
actual HARDWARE bugs in certain SPARC implementations.

Perhaps these have been already discussed in this news group?
(I've just started reading it).

-gjc

p.s. In a the next article I am posting an updated version of CRASHME.C
which people can use to investigate these issues further.

mash@mips.COM (John Mashey) (01/04/91)

In article <71537@bu.edu.bu.edu> gjc@buitc.bu.edu (George J. Carrette) writes:

>A few months ago I posted to a couple news groups (not this one) a
>program that crashed all RISC machines that it had been tried on,
>but not the VAX and 68020 machines that I had tried it on. 
>
>Much flaming resulted as expected, since I made the unfair statement
>that the system vendors of RISC machines did significantly less
>quality of testing than the vendors of CISC machines.
>
>There was also the unsupported statement that RISC machines, with
>their rich set of "result is undefined" sequences of instructions were
>more difficult to give a high quality of testing and/or engineering
>proof of correctness.

As a useful historical data point, as far as I can tell, UNIX (and other
fair-sized OSs)
managed to find problems in most chips / systems early in their life,
problems that eluded diagnostics, and sometimes, even other versions
of UNIX.  Common problems usually are found around exception-handling,
and they affected almost everything, CISC or RISC....
These include, at least: PDP-11/45 (in 1973), Moto 68K, NSC 32K,
Intel X86, i860 (or so I'm told by reliable sources :-), and MIPS
(we only had minor ones that could be worked around in software, but yes,
we did find some.  At least we didn't have any of the form "if an instruction
with addressing mode A, is the last instruction on a page, and if it references
a piece of data that spans page boundaries, and if the second part of the
address misses, then Bad Things Happen."
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

sef@kithrup.COM (Sean Eric Fagan) (01/04/91)

In article <71537@bu.edu.bu.edu> gjc@buitc.bu.edu (George J. Carrette) writes:
>The subject line is a good summary, one reason that hardware designers
>and vendors prefered machines with loadable microstore (and by the same
>token machines that have microcode) was that it made it possible to fix
>some "hardware" bugs without replacing chips or boards.
>RISC implementations have no microcode, so fixing a hardware bug
>means replacing a chip. 

John Mashey once posted that simulating a MIPS R3000 (I think it was),
completely in software, from power up to single user mode, took something
like seven days.  (John, sorry if I've gotten something wrong.)

How long do you think it would take to do the same thing for, say, VAXen,
Cybers (180 state, that is), or any other machine with loadable ucode?

Which do you think is going to result in more sales:  selling and shipping a
machine for which you need to send out periodic uCode upgrades, or building 
a bug-free implementation by shipping time, which is three or more time 
faster than the one with ucode?  Assuming neither party is IBM or DEC, that
is 8-).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

koopman@a.gp.cs.cmu.edu (Philip Koopman) (01/04/91)

In article <1991Jan04.035359.12547@kithrup.COM>, sef@kithrup.COM (Sean Eric Fagan) writes:
> John Mashey once posted that simulating a MIPS R3000 (I think it was),
> completely in software, from power up to single user mode, took something
> like seven days.  (John, sorry if I've gotten something wrong.)
> How long do you think it would take to do the same thing for, say, VAXen,
> Cybers (180 state, that is), or any other machine with loadable ucode?

The Harris Semiconductor RTX 4000 used partly ROMed and partly
writable ucode.  In the prototype run, the writable ucode was compiler-
selected fallback for any bugs in the ROM.  There was one bug that would
have been catastrophic if uncorrected (and unexercised in the simulation
by a stupid oversight).  So, we threw an opcode into RAM.
That was *much* easier than worrying about redoing all the
compiler code generation to avoid a particular instruction.
We were able to simulate power-on to running an interactive compiler
in about **10 minutes** of IBM PC-AT execution time using RTL-level
simulations.  But, it wasn't designed to be a Unix/workstation engine either.

> Which do you think is going to result in more sales:  selling and shipping a
> machine for which you need to send out periodic uCode upgrades, or building 
> a bug-free implementation by shipping time, which is three or more time 
> faster than the one with ucode?  Assuming neither party is IBM or DEC, that
> is 8-).
Unfortunately, RISC or CISC, microcoded or hardwired, it is highly
likely that the first rev. of *any* CPU has bugs.  That goes double
for compilers (especially RISC ones that are generally more ambitious).
I have personally observed such bugs in both RISCs and CISCs from
the "big guys".
Any CPU (RISC or CISC) can work around bugs by using the compiler,
but that takes more time and manpower than a mod to writable ucode
in many cases.
Note that as second-generation RISCs with heavy scoreboarding
come into use, the argument that they are simpler and therefore
more likely to be correct may(?) be less valid.

By the way, the RTX 4000 executed programs about 1.5 times as fast
as the RTX 2000 (which was hardwired) -- off the same fab line
and standard cell library.  The RTX 4000 never made it to market
because it had to wait for the not-bug-free, slower hardwired
processor to succeed enough to generate revenue (it didn't).

  Phil Koopman                koopman@greyhound.ece.cmu.edu   Arpanet
  2525A Wexford Run Rd.
  Wexford, PA  15090
*** this space for rent ***
Formerly, senior scientist at Harris Semiconductor.

jerry@TALOS.UUCP (Jerry Gitomer) (01/04/91)

In article <71537@bu.edu.bu.edu: gjc@buitc.bu.edu (George J. Carrette) writes:
:The subject line is a good summary, one reason that hardware designers
:and vendors prefered machines with loadable microstore (and by the same
:token machines that have microcode) was that it made it possible to fix
:some "hardware" bugs without replacing chips or boards.
:RISC implementations have no microcode, so fixing a hardware bug
:means replacing a chip.

Sorry guys, but the ability to fix bugs wasn't even a consideration in
deciding to use loadable microstore -- it was economics.  The computer
users demanded "families of computers", that is computer systems that
could run the same programs and varied only in price and performance. 
These demands were based on a desire to reduce the software costs
associated with upgrading to a faster machine.

Things were different when the first microcoded computers were built. 
Computer generations were five years apart.  By today's standards the
hardware was simplistic and slow.  Operating systems (and all the other
system software for that matter) were written in assembly language.  

Given these circumstances the solution was to use loadable microcode to
make a group of dissimilar computers look alike to the programmer.  The
best illustration was the IBM 360 family.  To the programmer each 360 was
a 32-bit word machine with 16 registers.  The 360/30 was an 8-bit machine,
the 360/40 was a 16-bit machine, the 360/50 was the only 32-bit machine in
the family, the 360/65 was a 64-bit(?) machine, and the 360/75 was even
bigger.  (Please don't flame me for ignoring the 360/20, 360/22, 360/25,
360/85, 360/90/91/95, 360/44, and the 360/67 since the additional detail
wouldn't add anything.)

-- 
Jerry Gitomer at National Political Resources Inc, Alexandria, VA USA
I am apolitical, have no resources, and speak only for myself.
Ma Bell (703)683-9090      (UUCP:  ...{uupsi,vrdxhq}!pbs!npri6!jerry 

gjc@buitc.bu.edu (George J. Carrette) (01/05/91)

In article <ROUELLET.91Jan4103017@pinnacle.crhc.uiuc.edu> rouellet@crhc.uiuc.edu (Roland G. Ouellette) writes:
>During the development of the VAX6000-400, a microcoded machine (not
>loadable however), VMS was booted in simulation all the way to the
>login prompt.  

Is it really not loadable? I've had a very knowledgable DEC hardware
person tell me that the VS-3100 vax chip implementation, even though thought
of as a "microprocessor" (usually implies on-chip microcode hardwired in)
actually had some component of loadable microstore.

-gjc

schow@bcarh185.bnr.ca (Stanley T.H. Chow) (01/05/91)

In article <777@TALOS.UUCP> jerry@TALOS.UUCP (Jerry Gitomer) writes:
>
>Sorry guys, but the ability to fix bugs wasn't even a consideration in
>deciding to use loadable microstore -- it was economics.  The computer
>users demanded "families of computers", that is computer systems that
>could run the same programs and varied only in price and performance. 
>These demands were based on a desire to reduce the software costs
>associated with upgrading to a faster machine.

I must confess that I wasn't around in those days, but I recall reading
that it was IBM that pushed the concept. The users didn't even conceive
of compatible families. After all, that is one of the major reasons for
IBM getting so big and hated - people kept buying their compatible
computers even though the architecture was obsolete.

>
>Things were different when the first microcoded computers were built. 
>Computer generations were five years apart.  By today's standards the
>hardware was simplistic and slow.  Operating systems (and all the other
>system software for that matter) were written in assembly language.  

Depends on what you mean by "today's standards". If you mean the main-
frames and supercomputers, certainly the old computers are simple by
comparsion. If you mean today's microprocessors, especially RISC chips,
then the reverse is true. For example, almost every single processor
chip on the market now is still catching up to the IBM Stretch computer,
some of the newer yet-to-be-announced chips are probably about equal
to the Stretch. We are talking a 30 year gap here.

I don't know how old your "old days" are, but COBOL, Fortran and Algol
were all around in the 50's. They predate most (if not all) of what
most people call the "Old MainFrame Computers". Burough did their OS
in Algol in the 60's.

Stanley Chow        BitNet:  schow@BNR.CA
BNR		    UUCP:    ..!uunet!bnrgate!bcarh185!schow
(613) 763-2831               ..!psuvax1!BNR.CA.bitnet!schow
Me? Represent other people? Don't make them laugh so hard.

rouellet@crhc.uiuc.edu (Roland G. Ouellette) (01/05/91)

> How long do you think it would take to do the same thing for, say, VAXen,
> Cybers (180 state, that is), or any other machine with loadable ucode?

During the development of the VAX6000-400, a microcoded machine (not
loadable however), VMS was booted in simulation all the way to the
login prompt.  The Digital Technical Journal about the 6000-400
mentions this.  I recall that it took a couple (perhaps several)
weeks of simulation on a VAX8600.
--
= Roland G. Ouellette			ouellette@tarkin.enet.dec.com	=
= 1203 E. Florida Ave			rouellet@[dwarfs.]crhc.uiuc.edu	=
= Urbana, IL 61801	   "You rescued me; I didn't want to be saved." =
=							- Cyndi Lauper	=

mash@mips.COM (John Mashey) (01/05/91)

In article <1991Jan04.035359.12547@kithrup.COM> sef@kithrup.COM (Sean Eric Fagan) writes:

>John Mashey once posted that simulating a MIPS R3000 (I think it was),
>completely in software, from power up to single user mode, took something
>like seven days.  (John, sorry if I've gotten something wrong.)
Close.  This is a gate-level-simulator for a next-generation million-transistor
chip (R4000) + surrounding hardware environment, NOT the 100K transistor
R3000.  It takes 8 days on a 50-vmips RC6280.  I'm told that "ls"
on an empty directory takes a morning...

Note that the general discussion of micro-code machines seems to have mixed
up a lot of issues.
	1) As somebody noted, the IBM S/360 line used microcode for a bunch of
	architectural reasons, having little to do with bug-fixing.
	2) Appropriate architectures and their implementations change over
	time in response to fundamental underlying technology trends in
	both hardware and software.  What was appropriate in an era of
	expensive core memory and CPUs consructed from many parts may
	be rather irrelevant in an era of VLSI processors, cheap DRAM,
	and fast small SRAMs.
	See Hennessy&Patterson, p. 208-243.  particularly relevant is:
	"The drawback of microcode has always been performance.  This is because	microprogramming is a slave to memory technology.: The clock cycle
	time is limited by the time to read microinstructions from control
	store.  In the 1950s, microprogramming weas impractical since
	virtually the oinly technology available for control store was the
	same one used for main memory.  In the late 1960s and early 1970s,
	semiconductor memory was available for control store, while main memory
	was constructed from core.  The factor of ten in cycle time that
	differentiated the two technologies opened the door for microcode.
	The popularity of cache memories in the 1970s again closed the gap,
	and machines were again built with the same technology for control
	store and memory.
	For these reasons instruction sets invented since 1985 have not relied
	on microcode.  Though no one linkes to predict the future-least of all
	in writing-it is the authors' opinion that microprogramming is bound to
	memory technology.  if in some future technology ROM becomes much
	faster than RAM, or if caches are no longer effective, microcode
	may regain its popularity."
-- 
-john mashey	DISCLAIMER: <generic disclaimer, I speak for me only, etc>
UUCP: 	 mash@mips.com OR {ames,decwrl,prls,pyramid}!mips!mash 
DDD:  	408-524-7015, 524-8253 or (main number) 408-720-1700
USPS: 	MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086

jimm@ima.isc.com (Jim McGrath) (01/05/91)

In article <777@TALOS.UUCP> jerry@TALOS.UUCP (Jerry Gitomer) writes:
>
>Sorry guys, but the ability to fix bugs wasn't even a consideration in
>deciding to use loadable microstore -- it was economics.  The computer
>users demanded "families of computers", that is computer systems that
>could run the same programs and varied only in price and performance. 
>These demands were based on a desire to reduce the software costs
>associated with upgrading to a faster machine.
>
It also made it possible to sell the same hardware at two price
points.  Adding no-op loops to slow down operation of the "cheaper"
version allowed a company sell two versions of a computer with no
increase of manufacturing overhead.

Jim

rouellet@crhc.uiuc.edu (Roland G. Ouellette) (01/05/91)

> Is it really not loadable?

The microcode for the CVAX (used in 3xxx, 6200 and as a process shrink
in the 6300), Rigel (6000-400) and Mariah (6000-500) (and perhaps the
uVAX II) is all ROM.  A load of work goes into making sure that it all
works correctly.
--
= Roland G. Ouellette			ouellette@tarkin.enet.dec.com	=
= 1203 E. Florida Ave			rouellet@[dwarfs.]crhc.uiuc.edu	=
= Urbana, IL 61801	   "You rescued me; I didn't want to be saved." =
=							- Cyndi Lauper	=

johnl@iecc.cambridge.ma.us (John R. Levine) (01/05/91)

In article <777@TALOS.UUCP> jerry@TALOS.UUCP (Jerry Gitomer) writes:
>Given these circumstances the solution was to use loadable microcode to
>make a group of dissimilar computers look alike to the programmer.  The
>best illustration was the IBM 360 family.  To the programmer each 360 was
>a 32-bit word machine with 16 registers.  The 360/30 was an 8-bit machine,
>the 360/40 was a 16-bit machine, the 360/50 was the only 32-bit machine in
>the family, the 360/65 was a 64-bit(?) machine, and the 360/75 was even
>bigger.

No full member of System/360 had loadable microcode.  The models 30 through
67 had microcode on little cards that could, I suppose, be changed if you had
a screwdriver.  The models 85, 91, and I believe also 75 were hard-wired.
The model 44 was a hard-wired subset with extra I/O instructions, intended
for real time applications.

The model 20 was a strange case, it was a desk-sized machine whose
architecture was an almost compatible subset of the larger machines although
the I/O was entirely different.  The slower models had ROM microprograms, but
the submodel 5, the fastest one, stored the microcode in the same core memory
as the application code.  If the microcode got frotzed, (core survives
power-off, so that was infrequent) there was a large deck of cards in the
back from which you could reload the microcode.  I know people who hacked in
extra instructions, but it was certainly not sanctioned by IBM.  Since I/O
was heavily assisted by the microcode, e.g. there was a single "read card and
translate to EBCDIC" instruction, they may have distributed extra microcode
to go with optional peripherals.

The 25 was the same engine running as a real 360, so I presume the same
tricks could be played, and the model 22 was a model 30 renumbered late in
its career with a lower price.  I bet IBM wished they all did have loadable
microcode; all early models with floating point had to be recalled to fix a
rather serious design error that caused extremely inaccurate results.

Many (all?) models of the follow-on 370 series did indeed have loadable
microcode.  Indeed, the now ubiquitous diskette first appeared as the boot
device for 370 microcode.  As the 370 series evolved, lots of features were
added, most notably the "DAT box" that provides virtual memory, but also
various I/O improvements and "assists" for virtual machines.  (On the lower
models of 360 and 370, the I/O channel was implemented in microcode in the
CPU, it didn't have its own processor.)  I expect that the ability to ship
new microcode disks was key to allowing these incremental upgrades without
totally alienating existing customers.

I'm not sure whether the 370 was the first commercial use of loadable
microcode.  I doubt it, but don't know of other earlier uses.  (Was the B1700
earlier?  I can't tell.)  Microcode per se dates back to the EDSAC 2 which
Wilkes first envisioned in 1951 and first started to do useful work in 1958,
but it used a ROM implemented in core that was not easily changed.

-- 
John R. Levine, IECC, POB 349, Cambridge MA 02238, +1 617 864 9650
johnl@iecc.cambridge.ma.us, {ima|spdcc|world}!iecc!johnl
" #(ps,#(rs))' " - L. P. Deutsch and C. N. Mooers

lindsay@gandalf.cs.cmu.edu (Donald Lindsay) (01/06/91)

In article <1991Jan04.205635.16420@iecc.cambridge.ma.us> 
	johnl@iecc.cambridge.ma.us (John R. Levine) writes:
>Indeed, the now ubiquitous diskette first appeared as the boot
>device for 370 microcode.

Strangely, it was actually the boot device for a 370 disk controller.
Which was, of course, microcoded. IBM's formatting standard came from
a subsequent short-lived data entry product; which explains why a 128
byte sector had to have 80 EBCDIC blanks, followed by 48 NULs.

Controllers, being less general-purpose than CPUs, are perhaps better
suited for LIW/horizontal-microcode implementations.  I believe that
IBM went with programmability because they wanted to play with
diagnostic features, and with seek optimization features.

Good microprocessors weren't available then, but now, the alternative
design is a micro or two + some sort of DMA/data-path hardware. Are
there any controller designers out there who'd like to compare and
contrast?
-- 
Don		D.C.Lindsay .. temporarily at Carnegie Mellon Robotics

henry@zoo.toronto.edu (Henry Spencer) (01/06/91)

In article <71537@bu.edu.bu.edu> gjc@buitc.bu.edu (George J. Carrette) writes:
>RISC implementations have no microcode, so fixing a hardware bug
>means replacing a chip. 

The same is true of most CISCs actually in the field, unless you weight
the count by size.  Neither of the archetypal CISCs -- the original 360
line and the pdp11 -- used software-loadable microcode, although their
less influential successors did.  In fact, no pdp11 has *ever* used
loadable microcode for its normal instruction set, although one or two
offered it as an option for specialized user extensions.  And modern
VLSI CISCs invariably use ROMs for microcode.

If I were cynical (who, me :-)), I would suggest that RAM microcode was
attractive only when the product

	architectural_complexity * desired_performance * urgency

exceeded some threshold where the manufacturer could be confident of full
debugging and optimization before release.

>A few months ago I posted to a couple news groups (not this one) a
>program that crashed all RISC machines that it had been tried on,
>but not the VAX and 68020 machines that I had tried it on. 

In case you didn't notice, it crashed some other CISCs and failed to crash
a number of RISCs when people tried it out more widely.
-- 
"The average pointer, statistically,    |Henry Spencer at U of Toronto Zoology
points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu   utzoo!henry

gjc@buitc.bu.edu (George J. Carrette) (01/07/91)

In article <1991Jan6.033536.14108@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In case you didn't notice, it crashed some other CISCs and failed to crash
>a number of RISCs when people tried it out more widely.
>-- 

Some people tried it on various CISCS that didn't have protected mode
operating systems. Obviously stupid to try, and anybody who would
take those results seriously is even more stupid.

One report of an actual CISC implementation hardware BUG was for a SUN-3
of some kind. A bug with some off-chip hardware.

Unfortunately there are not many good independant testers here.  I know of
some serious O/S bugs found by crashme that have been effectively
hushed-up by the manufacturers.

Quite a few people sent me mail saying "I tried crashme x y z on my
PDQ RISC" and it didn't crash, so you must be full of it.
Of course X Y Z was what crashed say a SPARC with one version of the OS
and PDQ was some entirely different processor. 

Private mail indicated that CRASHME.C raised quite a real fuss at one
large hardware manufacturer that has an R&D budget larger than SUN
microsystems total sales.

People should have more fun with this! There are a lot more funny
things left to discover.

-gjc

andrewt@cs.su.oz (Andrew Taylor) (01/07/91)

In article <71693@bu.edu.bu.edu> gjc@buitc.bu.edu (George J. Carrette) writes:
> Some people tried it on various CISCS that didn't have protected mode
> operating systems. Obviously stupid to try, and anybody who would
> take those results seriously is even more stupid.
> 
> One report of an actual CISC implementation hardware BUG was for a SUN-3
> of some kind. A bug with some off-chip hardware.

On our Sun 3/50s running SunOS 4.1 crashme can result (try crashme 32 1 4)
in a cpu-bound process which can't be killed. Rebooting seems the only way
to remove it. Seems like an OS bug to me.

Andrew

wje@redwood.mips.com (William J. Earl) (01/07/91)

In article <1991Jan04.205635.16420@iecc.cambridge.ma.us>, johnl@iecc (John R. Levine) writes:
> In article <777@TALOS.UUCP> jerry@TALOS.UUCP (Jerry Gitomer) writes:
> >Given these circumstances the solution was to use loadable microcode to
> >make a group of dissimilar computers look alike to the programmer.  The
> >best illustration was the IBM 360 family.  To the programmer each 360 was
> >a 32-bit word machine with 16 registers.  The 360/30 was an 8-bit machine,
> >the 360/40 was a 16-bit machine, the 360/50 was the only 32-bit machine in
> >the family, the 360/65 was a 64-bit(?) machine, and the 360/75 was even
> >bigger.
>...
> The model 20 was a strange case, it was a desk-sized machine whose
> architecture was an almost compatible subset of the larger machines although
> the I/O was entirely different.  The slower models had ROM microprograms, but
> the submodel 5, the fastest one, stored the microcode in the same core memory
> as the application code.  If the microcode got frotzed, (core survives
> power-off, so that was infrequent) there was a large deck of cards in the
> back from which you could reload the microcode.  I know people who hacked in
> extra instructions, but it was certainly not sanctioned by IBM.  Since I/O
> was heavily assisted by the microcode, e.g. there was a single "read card and
> translate to EBCDIC" instruction, they may have distributed extra microcode
> to go with optional peripherals.

      The 20 was a bit more than "desk-sized."  It was more like "Xerox
7000 series copier-sized," and it was not really compatible, in that it
would not run any non-trivial S/360 program.  At the site where I encountered
it, it made a nice RJE batch station, with card reader, card punch, and
line printer.

-- 
	William J. Earl			wje@mips.com
	MIPS Computer Systems		408-524-8172
	930 Arques Avenue, M/S 1-03	FAX 408-524-8401
	Sunnyvale, CA 94086-3650

guy@auspex.auspex.com (Guy Harris) (01/08/91)

>>A few months ago I posted to a couple news groups (not this one) a
>>program that crashed all RISC machines that it had been tried on,
>>but not the VAX and 68020 machines that I had tried it on. 
>
>In case you didn't notice, it crashed some other CISCs and failed to crash
>a number of RISCs when people tried it out more widely.

And on at least one of the RISCs where it crashed (some MIPS-based
machine), the bug was in the *operating system*, not in the chip - some
code in the OS didn't properly handle an illegal instrution in the delay
slot of an illegal conditional branch, or something like that.  I suspect
the problem on SPARC machines is also an OS bug, probably in the
floating-point emulation code.

(That code could perhaps be considered equivalent to microcode on some
CISCs; if so, one might consider it to argue in favor of loadable
microcode on machines that have microcode, I suppose.)