nagy%warner.hepnet@LBL.ARPA (08/04/87)
Normally, the system dump file for a system is sized as physical memory size plus 4 blocks. When the system shuts down or crashes, all of physical memory is written to the SYSDUMP.DMP file (just before the TYPE... HALT... message). The extra 4 blocks are 1 for a header and 3 for error log buffers. If SYSDUMP.DMP is smaller than physical memory size, then not all of physical memory is written to the file - just that part which fits. So far, so good... In order to save space on the system disk I deleted the system-specific system-dump-files and created one common SYSDUMP.DMP with a size large enough to hold 32 Mb (+4 pages) which is the memory-size of the biggest VAX. Ok sofar if only one system crashes. But what happens if you shut down the entire cluster? If you try later on to ANALYSE/CRASH the system dump file you get something like the following: I presume all our 4 VAXes try to write their memory content in the only SYSDUMP file in an uncoordinated way. I noticed this same error-message on all our 4 VAXes (8700, 8700, 8600, 785) so the problem resides in the system dump file itself. After rebooting one of the 8700's the system dump file was no longer corrupted and I was able to do an ANALYSE/CRASH which showed correctly that the operator requested the shutdown. Any comments, ideas? Right, as far as I know and expect, all the VAXes will write to the SYSDUMP file with no coordination between them. At this point, any coordination is impossible as the system is in the final throes of stopping and any cluster connections, etc. have since been broken. So in the case described above, the information in SYSDUMP.DMP is probably a total mishmash and useless. Putting SYSDUMP.DMP into SYS$COMMON is useful for the case where one system crashes with a BUGCHECK but the others are unaffected. If the whole cluster blows... Anyway, what else can one do in a cluster where the 8800 is to have 128 MB, the 8600 and two 8650's will each have between 40 and 64 MB! More than 1/2 of an RA81 will disappear just into SYSDUMP.DMP files - this is NOT a practical solution! = Frank J. Nagy "VAX Guru" = Fermilab Research Division EED/Controls = HEPNET: WARNER::NAGY (43198::NAGY) or FNAL::NAGY (43009::NAGY) = BitNet: NAGY@FNAL = USnail: Fermilab POB 500 MS/220 Batavia, IL 60510
klb@philabs.Philips.Com (Ken Bourque) (08/06/87)
How about setting SAVEDUMP to 1 and using the page files to put the dump in? You gotta have them anyway, so all you need to do is make sure they're big enough (which they probably are already) and put an SDA copy command in systartup.com. I use this method (not on a cluster), it works fine. Ken Bourque klb@philabs.philips.com -or- ...!seismo!philabs!rge