[comp.arch] I crashed our MIPS machine today

arne@hpserv1.uit.no (Arne Helme) (08/10/90)

Yesterday i saw a program on comp.lang.c that claimed that it could
crash certain RISC architectures from user mode. I tried it on our
decsystem 5200 running Ultrix V3.1A and on a sun 4 SPARC station
running sunOS4.03c. Both machines crashed.

The program was quite simple. It allocated a chunk of memory and
filled it with garbage. Then it tried to execute this garbage as
instructions. 

I thought this should be impossible! Would anyone out there like
to comment the strange behaviour I have observed on our machines?

-- Arne Helme

--
//// Arne Helme, science assistant// Email: arne@sfd.uit.no                  /
/// Computer Science Department  // "Going on means going far. Going far    //
// University of Tromsoe        //    means returning." (Tao Te Ching)     ///
/ N-9000 Tromsoe, NORWAY       // Phone: +47 83 44035                     ////

guy@auspex.auspex.com (Guy Harris) (08/16/90)

>I thought this should be impossible!

Yeah, it *should* be impossible to crash the OS from non-privileged
user-mode code, but sometimes there are bugs in the OS.

>Would anyone out there like to comment the strange behaviour I have
>observed on our machines?

From a quick look at a crash dump on an SS1 running 4.0.3c, my suspicion
is that the kernel code for handing the floating point unit isn't being
careful enough in looking at the floating point state; it appears to be
handing a bad pointer to another routine that calls the procedure to
which that pointer is supposed to point, only it points into the nether
reaches of Hell instead.

In other words, it doesn't appear to bear out the conclusions the
original poster of the program, in "comp.os.vms", drew:

  OK. Here is a quick summary of the HOW TO CRASH A RISC machine from
  a USER-MODE program test. Reports have arrived that all of these machines
  can be crashed using CRASHME.C:
  IBM RT, MIPS, DECSTATION 5000, SPARC.
 
  On the two CISC architectures tried, VAX/VMS and SUN-3, the program
  either completed or exited with a core or register dump, as expected.
 
  Some background/motivation. My experience with microcode programming
  taught me that some sequences of MICROINSTRUCTIONS could wedge or jam
  the hardware in such a way that recovery was impossible without
  a reboot of some kind. The RISC architectures have some of the same
  properties of MICROCODE in that certain instruction sequences have
  UNDEFINED behavior. Now one of the great costs in a CISC machine is
  usually the trouble the designers go through to make sure that
  every instruction returns the MACHINE to a KNOWN STATE. That way
  the behavior of every instruction can be well defined, tested, and
  documented, individually verified and tested, and by simple induction
  be valid for arbitrary SEQUENCES of instructions. (In general).
 
  Engineers of RISC machines don't bother to do this, which is one of
  the reasons they are CHEAPER (the hardware, not the engineers).
 
  The problem of proving that an arbitary sequence of instructions "N"
  long will not crash the machine is much more costly if N > 1.
  (To say the least, if you know anything about mathematical logic).
  If there are M instructions (and M is probably around 1 BILLION)
  then there may be about M^N cases to check. And what is N? 
  For a classic CISC machine a price is paid to make N = 1, or
  at least small. But for a RISC machine, might N be 10 or more?
 
  Anyway, no need to make too big a deal about this. Probably all the
  vendors can fix things in software alone, and certainly CISC chips
  with bugs in them have been shipped in the past too.
 
  Just a reminder though. There is no free lunch. There really is
  a trade-off between ROBUSTNESS-PRICE/PERFORMANCE-TIME_TO_MARKET.

The *only* way in which you *might* be able to agree with this as being
the source of the problem - at least in the SPARC case, and maybe in the
MIPS case as well - would be to claim that the floating-point support
software was part of the implementation of the architecture, and that
the checks he alleges are made for CISC but not RISC machines weren't
made in the software part of the architecture.  It certainly doesn't
seem to be the case that the *processor* gets stuck in some state "in
such a way that recovery was impossible without a reboot of some kind."

bwong@cbnewsc.att.com (bruce.f.wong) (08/16/90)

In article <3899@auspex.auspex.com> guy@auspex.auspex.com (Guy Harris) writes:
>>I thought this should be impossible!
>
>Yeah, it *should* be impossible to crash the OS from non-privileged
>user-mode code, but sometimes there are bugs in the OS.

I gave up trying to crash, with crashme.c, a SUN4/40 running SunOS 4.1.
I used about 30 different combinations of arguments and then got bored
when it didn't crash.  Then I tried crashme using the arguments suggested
in the original article on a SUN4/110 running SunOS 4.0 and it crashed
immediately.  My conclusion is that the OS is to blame and this is re-
inforced by a posting that stated no success crashing two different RISC
machines running the Mach OS.
-- 
Bruce F. Wong		ATT Bell Laboratories
att!iexist!bwong	200 Park Plaza, Rm 1B-232
708-713-5111		Naperville, Ill 60566-7050 USA

my@dtg.nsc.com (Michael Yip) (08/17/90)

I crashed my machine here!  And I also crashed some other ones.

Machines that I crashed:-
	SUN4/160(?)
	SparcStation 1+

-- Mike