[comp.sys.sun] SunOs 4.0.3 & shared memory

gnb@melba.bby.oz.au (Gregory Bond) (02/02/90)

Does this ring any bells with anyone???

We have an application that uses signals, shared memory and message queues
to distribute information round several client processes.  This ran fine
under SunOs 3.5.  We have upgraded some of our machines to SunOs 4.0.3,
and it is failing intermittently (a couple times per week).

What happens is that occasionally, the server process (fs02) and the
update process get stuck in an unkillable wait.  Always both together.
Occasionally, a biod or two will also be stuck.  It requires a reboot to
free the machine.

Both the update and the fs02 are waiting on addresses that aren't
recognised by sps.  The addesses change each time, but are in the same
general area (i.e. 0x172zz and 0x170yy).

toast# sps vb
Ty User     Status Fl Nice Virt Res %M  Time Child %C Proc# Command
   root     x17249          24    0  0 221.4        0   107 update
   root     x170ad         936    0  0   4.0        0   612 fs02
pa.root     RUN       -20  104  280  4   0.3        0   679 sps vb
47 (7328k) processes, 4 (1232k) busy, 22 (5264k) loaded, 25 (5080k) swapped

The machine has plenty of swap space, table space etc at the time.  No
messages on the console or the /usr/adm/messages log.  The rest of the
machine is fine, but unfortunatly the fs02 process is quite important, and
most users rely on it!

It looks like some sort of deadlock between update and the fs02.

Can anyone shed some light on this?

Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb