[comp.os.vms] PLacement of SYSDUMP.DMP

nagy%warner.hepnet@LBL.ARPA (08/04/87)

Normally, the system dump file for a system is sized as physical memory
size plus 4 blocks.  When the system shuts down or crashes, all of physical
memory is written to the SYSDUMP.DMP file (just before the TYPE... HALT...
message).  The extra 4 blocks are 1 for a header and 3 for error log
buffers.

If SYSDUMP.DMP is smaller than physical memory size, then not all of
physical memory is written to the file - just that part which fits.
So far, so good...

       In order to save space on the system disk I deleted the
       system-specific system-dump-files and created one common
       SYSDUMP.DMP with a size large enough to hold 32 Mb (+4 pages)
       which is the memory-size of the biggest VAX.
     
       Ok sofar if only one system crashes.
     
       But what happens if you shut down the entire cluster? If you
       try later on to ANALYSE/CRASH the system dump file you get
       something like the following:
     
       I presume all our 4 VAXes try to write their memory content in
       the only SYSDUMP file in an uncoordinated way. I noticed this
       same error-message on all our 4 VAXes (8700, 8700, 8600, 785)
       so the problem resides in the system dump file itself. After
       rebooting one of the 8700's the system dump file was no longer
       corrupted and I was able to do an ANALYSE/CRASH which showed
       correctly that the operator requested the shutdown.
     
       Any comments, ideas?
     
Right, as far as I know and expect, all the VAXes will write to the
SYSDUMP file with no coordination between them.  At this point, any
coordination is impossible as the system is in the final throes of
stopping and any cluster connections, etc. have since been broken.
So in the case described above, the information in SYSDUMP.DMP is
probably a total mishmash and useless.  Putting SYSDUMP.DMP into
SYS$COMMON is useful for the case where one system crashes with
a BUGCHECK but the others are unaffected.  If the whole cluster blows...

Anyway, what else can one do in a cluster where the 8800 is to have
128 MB, the 8600 and two 8650's will each have between 40 and 64 MB!
More than 1/2 of an RA81 will disappear just into SYSDUMP.DMP files -
this is NOT a practical solution!
     

= Frank J. Nagy   "VAX Guru"
= Fermilab Research Division EED/Controls
= HEPNET: WARNER::NAGY (43198::NAGY) or FNAL::NAGY (43009::NAGY)
= BitNet: NAGY@FNAL
= USnail: Fermilab POB 500 MS/220 Batavia, IL 60510

klb@philabs.Philips.Com (Ken Bourque) (08/06/87)

How about setting SAVEDUMP to 1 and using the page files to put the dump in?
You gotta have them anyway, so all you need to do is make sure they're big
enough (which they probably are already) and put an SDA copy command in
systartup.com.  I use this method (not on a cluster), it works fine.

Ken Bourque  klb@philabs.philips.com -or- ...!seismo!philabs!rge