[news.software.b] Cnews: relaynews getting 'illegal instruction' on SparcStation 1

sater@cs.vu.nl (Hans van Staveren) (12/11/89)

I am testing Cnews on a SparcStation 1 we have lying around as a spare
before unleashing it on our unsuspecting users. It does seem to run,
sort of, but relaynews is getting 'illegal instruction' dumps seemingly
at random. Here is one stacktrace:
newshost% adb /usr/lib/newsbin/relay/relaynews
core file = core -- program ``relaynews''
$SIGILL 4: illegal instruction
c
_sanitise() + 0
_history(0xf7fffd10,0x1000000,0x1,0x13840,0x2000,0x15d20) + 18
_insart(0xf7fffd10,0xf7fffd58,0xe582,0x169c8,0x169cc,0x15c18) + 48
_cpinsart(0xfff0,0xe582,0x2218,0x1,0x15a58,0xf7fffd10) + 9c
_unbatch(0xfff0,0xe582,0x0,0x10018,0x0,0x15a60) + 120
_process(0xfff0,0xe582,0x1,0xf400,0x15a38,0x23) + 60
_procargs(0x3,0xf7ffff5c,0xf7fffef0,0x15a68,0x0,0x0) + 28
_main(0x3,0xf7ffff5c,0xf7ffff6c,0xe400,0x0,0x0) + a0

The strange part is that there is of course no illegal instruction at that
address, and when you run relaynews on the same file again it functions
without problems. I sort of suspect hardware/OS trouble, but before I
try it on one of our many other spare machines :-) I would be interested
in comments of people that already run Cnews on SparcStation 1.

Our configuration:
SS1 with 8Mb, CDC Wren IV as SCSI target 0, meaning sd3 for SunOS.
SunOS 4.0.3c operating system.

One minor nit: if relaynews crashes, leaving a LOCK around, it would
be kind of nice if newsrun detected that, and would remove the LOCK.
Better of course would be relaynews not crashing, but I seriously doubt
whether I can blame the authors for this one.

	Hans van Staveren

geoff@utstat.uucp (Geoff Collyer) (12/12/89)

Hans van Staveren:
>One minor nit: if relaynews crashes, leaving a LOCK around, it would
>be kind of nice if newsrun detected that, and would remove the LOCK.

To paraphrase our standard reply (from memory): In general, the
breakdown of a locking protocol is indicative of serious problems
requiring human attention and software should not remove apparently-dead
locks.  Programs can run for surprising lengths of time on loaded
machines and in the presence of network file systems there is in general
no way to verify that the locking process has died.
-- 
Geoff Collyer		utzoo!utstat!geoff, geoff@utstat.toronto.edu

henry@utzoo.uucp (Henry Spencer) (12/12/89)

In article <4757@sater.cs.vu.nl> sater@cs.vu.nl (Hans van Staveren) writes:
>One minor nit: if relaynews crashes, leaving a LOCK around, it would
>be kind of nice if newsrun detected that, and would remove the LOCK.
>Better of course would be relaynews not crashing...

We tend to follow the philosophy that says that dying processes represent
a major problem that should be investigated, and blindly removing the
lock and charging onward is often inappropriate.  Of course, if the thing
is dying at random, that's a case where a simple retry really is in order,
but one hopes that sort of problem is rare...

The real problem with doing anything about it is that there is no reliable,
portable, efficient way to determine whether a process is still alive.
-- 
1755 EST, Dec 14, 1972:  human |     Henry Spencer at U of Toronto Zoology
exploration of space terminates| uunet!attcan!utzoo!henry henry@zoo.toronto.edu