gnb@melba.bby.oz.au (Gregory Bond) (02/02/90)
Does this ring any bells with anyone??? We have an application that uses signals, shared memory and message queues to distribute information round several client processes. This ran fine under SunOs 3.5. We have upgraded some of our machines to SunOs 4.0.3, and it is failing intermittently (a couple times per week). What happens is that occasionally, the server process (fs02) and the update process get stuck in an unkillable wait. Always both together. Occasionally, a biod or two will also be stuck. It requires a reboot to free the machine. Both the update and the fs02 are waiting on addresses that aren't recognised by sps. The addesses change each time, but are in the same general area (i.e. 0x172zz and 0x170yy). toast# sps vb Ty User Status Fl Nice Virt Res %M Time Child %C Proc# Command root x17249 24 0 0 221.4 0 107 update root x170ad 936 0 0 4.0 0 612 fs02 pa.root RUN -20 104 280 4 0.3 0 679 sps vb 47 (7328k) processes, 4 (1232k) busy, 22 (5264k) loaded, 25 (5080k) swapped The machine has plenty of swap space, table space etc at the time. No messages on the console or the /usr/adm/messages log. The rest of the machine is fine, but unfortunatly the fs02 process is quite important, and most users rely on it! It looks like some sort of deadlock between update and the fs02. Can anyone shed some light on this? Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia Internet: gnb@melba.bby.oz.au non-MX: gnb%melba.bby.oz@uunet.uu.net Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb