[comp.misc] Crash a RISC machine from user-mode code:

rang@cs.wisc.edu (Anton Rang) (08/13/90)

[ This isn't really a C issue; I'm redirecting it to comp.misc. ]

In article <49041@seismo.CSS.GOV> stead@beno.CSS.GOV (Richard Stead) writes:
>Do VAX-CISC programmers spend their days branching to random data?

  Nobody that I know of actually does this on *purpose*.  However,
it's not that hard to screw up a program so that it does this.  For
instance, passing the wrong argument (or an uninitialized pointer) to
a routine expecting a function pointer can do it.  Or trashing the
stack by going past array bounds.  Lots of possibilities.

>Or if I ever do, I would fix it pretty damn quick.

  Same here.  Nobody's saying that branching to random data is good.

>Who could possibly care that a random instruction sequence crashes a risc box?

  Well...if I have my own workstation, and always have all my work in
progress saved before I run a program (generally true for the most
part, since I'm paranoid) I don't care that much, though it's still
annoying .  If I'm sharing the machine, I *don't* want it to crash
twice a day because the guy down the hall is trying to find the
stack-clobbering bug in his code.

  Educational environments are even more prone to this.  If you have a
Sun-4/490 shared between 200 users, you don't want somebody to be able
to crash the machine at will ("denial of service" attack).  There are
students out there who will think this is a great joke....

  I hope that the problems shown up by this test can all be fixed in
software; I expect that they can, and that the vendors will do so.  In
general, nothing a user-mode process can do should be able to crash
the machine...otherwise, what's the point of privileged instructions,
kernel mode, etc.?

	Anton

+---------------------------+------------------+-------------+
| Anton Rang (grad student) | rang@cs.wisc.edu | UW--Madison |
+---------------------------+------------------+-------------+

gordoni@chook.adelaide.edu.au (Gordon Irlam) (08/14/90)

I've managed to track down the cause of one of the crashes on a Sun4.

The following C program crashes a 4/330 running SunOS 4.03.

---- start of crash_sun.c ----

main = 0xbfafffff;

---- end of crash_sun.c ----

This is a floating point compare instruction with an invalid type of value
for the comparison (ie. not a single, double, or extended precision value).
Presumably the instruction gets passed to the floating point unit, causing
it to panic in some way, which in turn results in the CPU crashing.

I imagine that the CPU, FPU interface is one of the most common areas for
such bugs due to the complexities caused by its heavily asynchronous
nature.

The program does not crash a 4/60 running SunOS 4.1.  But I can't tell
whether I am looking at a hardware problem, or an operating system problem.

Could someone with a 4/330 running SunOS 4.1, or a 4/60 running SunOS 4.03
please try this program so that the cause of the problem can be determined.

Speculating on likely causes of such crashes I would imagine that the vast
majority are simply O/S bugs.  A few might be system design bugs, although
most of these can probably be made safe by clever O/S programming.  This
includes neglecting things such as the possibility of a page fault being
caused by pre-fetching an annulled instruction.  And any number of bugs in
the various bits of design glue that hold a modern system together.  I
think the possibility of a CPU bug is quite unlikely.  In fact I think that
executing random sequences of code is one of the common tests used to
check a new CPU design.

I don't think their is any real merit to the claim that such bugs are more
likely on RISC than CISC machines - in fact the simplicity of RISC machines
could be used to argue the other way.  Any differences that are seen between
such machines can be more than explained by the considerably difference in
the age of the O/S ports for the respective architectures.  Most of the O/S
bugs on the CISC machines have probably been fixed long ago.

An important thing to realize given the state of software engineering today,
is that there is a serious tradeoff between functionality and reliability.
And most vendors seem to be putting functionality first.  Based on my
experience reliable NFS is an oxymoron, but that hasn't stopped it from
being adopted by almost the entire unix speaking world.

A distinction also needs to be made between bugs, and simply strange
quirks of the hardware.  This distinction isn't always clear.  And in many
cases failure to understand a hardware quirk by the O/S designers can
result in a bug.  SPARC has several quirks/bugs which I am aware of:

    1) The read status register, write status register sequence is
       interruptable.  Also it is not possible to only write particular
       fields in the psr.  This means that it is not possible to use this
       sequence to clear the trap enable bit and thereby disable traps
       since between reading and writing the psr an interrupt trap may have
       occurred causing the cwp field of the psr to have changed value.
       This almost certainly wasn't realised when the architecture was
       designed, instead it follows as a natural consequence of other
       design decisions.  It is not possible to alter the architecture, and
       so this quirk stays.  Fortunately, some fairly complicated ways
       exist that allow you to get around this quirk, and disable traps.

    2) Setting the interrupt level and trap enable fields of the psr
       simultaneously can cause spurious interrupts, with the Fujitsu
       chip set.  This is now documented in the SPARC manuals, and so it is
       no longer a bug.  It just requires using two separate instructions,
       where one would have been used previously.

    3) Early versions of SPARC only had an atomic swap memory with 0xff
       instruction.  This is sufficient for implementing semaphores, etc.,
       but is no good if the semantics of the swapped value are determined
       by external hardware, such as is the case with page tables.  New
       versions of the architecture, and the Cypress chip set, include an
       atomic swap memory with register instruction.

    4) Current versions of SPARC do not have multiply or divide
       instructions.  Sun seems to be worried about the marketing
       implications of this, and so opcodes for these operations have been
       assigned.  This isn't really a bug but the marketing people behave
       as if it is rather than justifying the original design decision.

    5) An opcode exists for flushing an internal instruction cache, but
       not for flushing an internal data cache.  Perhaps there is a good
       reason for this, but I can't see one.  It is not clear if future
       processors will use the one instruction to flush both instruction
       and data caches, or whether a new instruction will be added.

----

A note for people running crashme.c.  Typically you have to run it with
many different arguments before you encounter a crash.  And on Sun's, at
least, the image frequently gets terminated long before the specified number
of iterations have completed.  (Someone mentioned that a 4/60 did not crash.
But after playing around with one for a while I managed to crash it (under
SunOS 4.1)).


                                                 Gordon Irlam
                                                 gordoni@cs.adelaide.edu.au

gordoni@chook.adelaide.edu.au (Gordon Irlam) (08/15/90)

In a previous article I mentioned that the following C program crashes
a 4/330 running SunOS 4.03.

---- start of crash_sun.c ----

main = 0xbfafffff;

---- end of crash_sun.c ----

It looks like this is a bug in SunOS 4.03, and not a hardware problem.
Unless, perhaps it is an underlying hardware flaw, but it is capable
of being masked by the operating system.

Thanks to John M. Blasik (john@mlb.semi.harris.com) for running it on
a 4/330 under SunOS 4.1, where it didn't crash.  And to Vick Khera
(khera@cs.duke.edu) for running on a 4/60 under SunOS 4.03, where it
did crash.

                                          Gordon Irlam
                                          (gordoni@cs.adelaide.edu.au)