cetron@CS.UTAH.EDU (Edward J Cetron) (10/31/87)
since i never saw this come back, i will resubmit it....(again....) apologies to anyone who did see it.... we are suddenly experiencing some bizzare crashes and hanging computers. A quick synopsis of the configuration: 3 microvax II's, one with 13Mb, the others with 9 all three are in a LAVC with the boot node having 9Mb the boot node has 1 RD54 and 2 RD53's, the others each have 3 RD53's. All nine disks are served across the entire cluster All three microvaxes have Vaxstation II/GPX upgrades. We are running VMS 4.6 and VWS 3.2 with 19 point font support. Ok, when running programs which use the gpx screen heavily (especially for a clear/erase and repaint operation) the systems go into one of two modes: a fatal bugcheck of RESEXH - resources exhausted, system shutting down or a non-fatal bugcheck of INCONSTATE - inconsistent I/O data base which fills the errlog file in minutes and renders the system almost useless in short order. Any device trying to get to the VAA0: (gpx screen) is in a wierd state which uses cpu time, never does any I/O and can't be stop/id'ed. I have tried just about everything - I've increased pool (both paged and nonpaged), I've increased the irp, srp, and lrp lists, and i've upped the number of resource blocks and lock id blocks...I can supply all of the various values if desired but in general an analy/system shows the system running at the limits 80-92% full on srp, irp, 27% utilized on lrp lists, and between 90-95% utilized when I finally resources exhausted bugcheck. The only odd thing about analy/sys or analy/crash is that when I have sda 'sho res' I get a lot of resource blocks which have a seq number (which is undocumented) and then the notation 'Not valid' right after the sequence number. Also, the ascii text of a lot of them (there are 3-500 of them) seem to be disk resource blocks (i.e. F11B$ CEDCAD_SYS, where cedcad_sys is the vol name of one of our served disks) and some have 'bad' ascii in them (F11B$s0+.). Has anyone seen anything like this? Does anyone have ANY ideas? I at first thought it obviously the gpx driver/workstation software, but this resources block stuff seems to be disk server/cluster related, but......but then again I do see some error counts for the vaa0: device, but no entries in the log file.... I'd be happy to supply lots more information, but I don't even have a clue as to what to look for anymore. Unfortunately, we shifted over to vms 4.6, and vws 3.2 at the same time and so I can't really isolate it to one or the other. I apologize for the length, but our facility has come to an absolute halt until we can figure out what the problem is. Any insight, comments, suggestions, ANYTHING will be greatly appreciated. thanx, -ed cetron center for engineering design univ of utah cetron@cs.utah.edu cetron@utahcca.bitnet 801-581-5304 or 801-581-6499 d