farhad@CS.Stanford.EDU (Farhad Shakeri) (10/04/90)
Hi We have found a very strange problem on only one of our 3100s (mine) runing ultrix (3.1c). Dump fails almost immediately after it starts by this error: DUMP: SIGSEGV() ABORTING! Illegal instruction (core dumped) This problem started about 2 weeks ago and we can't figure out why. I thought I had a bad binary or corrupted filesystem but those looked fine when I compared (sum) them to other 3100s and they looked fine and passed fsck! dump failed in all forms of test, even to /dev/null . anyway if anybody has seen this sort of problem please let me know. I am going to convert to 4.0 soon but I would like to solve this mystery. Also can this be a hardware problem? Thanks a lot. -- +----------------------------------------------------+ / Farhad Shakeri E-Mail: / / Stanford University farhad@Tehran.Stanford.EDU / / Computer Science Dept. / +----------------------------------------------------+
alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/04/90)
In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes: } } [ Dump fails with a segmentation fault. ] } I have two questions back at you. Are you using the 'u' flag to update /etc/dumpdates? Does the file /etc/dumpdates really exist? If the answer to the first question is yes and the second no, create one and see if the problem goes away. If it does PLEASE, PLEASE submit an SPR. Bugs like this should have disappeared ages ago. } / Farhad Shakeri E-Mail: / -- Alan Rollow alan@nabeth.enet.dec.com
grr@cbmvax.commodore.com (George Robbins) (10/04/90)
In article <1990Oct3.171146.4158@Neon.Stanford.EDU> farhad@CS.Stanford.EDU (Farhad Shakeri) writes: > > Hi We have found a very strange problem on only one of our > 3100s (mine) runing ultrix (3.1c). > > Dump fails almost immediately after it starts by this error: > > DUMP: SIGSEGV() ABORTING! > Illegal instruction (core dumped) > > This problem started about 2 weeks ago and we can't figure out > why. I thought I had a bad binary or corrupted filesystem > but those looked fine when I compared (sum) them to other 3100s > and they looked fine and passed fsck! When I've seen this kind of problem, it has been due to a copy of the image in the swap area getting corrupted. If you reboot the machine, and then try the dump again immediately, do you get the same problem? Have you checked the error log for disk errors? Are you running out of swap space? Are you doing anything different on this machine than the others, slip, NFS, DECNET? > Also can this be a hardware problem? Could be... -- George Robbins - now working for, uucp: {uunet|pyramid|rutgers}!cbmvax!grr but no way officially representing: domain: grr@cbmvax.commodore.com Commodore, Engineering Department phone: 215-431-9349 (only by moonlite)
farhad@CS.Stanford.EDU (Farhad Shakeri) (10/05/90)
In article <1753@shodha.enet.dec.com>, alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes: |> In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes: |> } |> } [ Dump fails with a segmentation fault. ] |> } |> |> I have two questions back at you. Are you using the |> 'u' flag to update /etc/dumpdates? Does the file |> /etc/dumpdates really exist? If the answer to the |> first question is yes and the second no, create one |> and see if the problem goes away. |> |> If it does PLEASE, PLEASE submit an SPR. Bugs like |> this should have disappeared ages ago. |> |> } / Farhad Shakeri E-Mail: / |> |> |> -- |> Alan Rollow alan@nabeth.enet.dec.com dump failed in all cases with or without 'u' . I will submit an SPR, if it helps. -- +----------------------------------------------------+ / Farhad Shakeri E-Mail: / / Stanford University farhad@Tehran.Stanford.EDU / / Computer Science Dept. / +----------------------------------------------------+
farhad@CS.Stanford.EDU (Farhad Shakeri) (10/05/90)
In article <14862@cbmvax.commodore.com>, grr@cbmvax.commodore.com (George Robbins) writes: |> In article <1990Oct3.171146.4158@Neon.Stanford.EDU> farhad@CS.Stanford.EDU (Farhad Shakeri) writes: |> |> When I've seen this kind of problem, it has been due to a copy of the |> image in the swap area getting corrupted. If you reboot the machine, |> and then try the dump again immediately, do you get the same problem? YES! I have done everything I can think of. fsck, power on/off no errors in errlog file. checked the dumpdates file... |> ... |> > Also can this be a hardware problem? |> |> Could be... |> |> -- |> George Robbins - -- +----------------------------------------------------+ / Farhad Shakeri E-Mail: / / Stanford University farhad@Tehran.Stanford.EDU / / Computer Science Dept. / +----------------------------------------------------+
alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/05/90)
In article <1990Oct5.001625.11355@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes: > In article <1753@shodha.enet.dec.com>, alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes: > |> In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes: > |> } > |> } [ Dump fails with a segmentation fault. ] > |> } > |> > |> [ I suggest that it might a lack of the /etc/dumpdates file. ] > > dump failed in all cases with or without 'u' . > Not that bug, oh well. Try this one. Take a very close look at /etc/fstab. Particularly the 2nd field of each line. Are all the path names properly formed? Do they all begin with '/'? How about the rest of the file? Any missing fields? It seems that dump is very intolerent of a bad /etc/fstab. This too is a bug. If it turns out to be this, then please submit an SPR on it. While you're there you should also mention that a successful dump returns an exit status of one (1) instead of zero (0) like most programs. I also consider this one a bug. Maybe it will get fixed in the OSF/1 based system... > -- > / Farhad Shakeri E-Mail: / -- Alan Rollow alan@nabeth.enet.dec.com
farhad@CS.Stanford.EDU (Farhad Shakeri) (10/06/90)
In article alan@shodha.enet.dec.com writes: |> |> Alan Rollow alan@nabeth.enet.dec.com The problem is fixed. It was a bad binary that had gone bad or something!?! anyway I took the binary from another 3100 instead of my own old backups and it is dumping! very strange I took the binaries from a 5400 and it failed (same OS) maybe I have some bad spots on my disk! anyway thanks for your suggestions. -- +----------------------------------------------------+ / Farhad Shakeri E-Mail: / / Stanford University farhad@Tehran.Stanford.EDU / / Computer Science Dept. / +----------------------------------------------------+
6600jimi@ucsbuxa.ucsb.edu (Jim Davidson) (10/06/90)
In article <1753@shodha.enet.dec.com> alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes: Yes! We have the exact same problem on 2 3100's and 2 2100's! I thought we were alone! >In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes: >} >} [ Dump fails with a segmentation fault. ] >} > I have two questions back at you. Are you using the > 'u' flag to update /etc/dumpdates? Does the file > /etc/dumpdates really exist? If the answer to the > first question is yes and the second no, create one > and see if the problem goes away. I can answer this: It makes ABSOLUTELY NO DIFFERENCE! In fact, if the /etc/dumpdates file exists and is not empty, doing % dump w to list file systems to dump will give "Segmentation fault (core dumped)" This is a long standing problem I have had and Dec has never been able to solve it. Just today I showed this annoyance to a hardware guy from Dec (here for another reason) and he thinks it's software. However, I was on the phone for hours with Atlanta a few months back to no avail. My attempted solution was to simply tar off what I needed and install straight from the book Ultrix 4.0. This changed nothing! Be assured, your dump image is fine- you can call Atlanta and compare the output of % sum /usr/bin/dump with your machine and a machine at the hands of the Dec support person. I'm positive they'll match. The very strange thing is that it seems to be a bug that spreads- it started on a 3100 we had on loan, spread to a second loaner 3100, and is now infesting two 2100's we've bought. Ths truth of the matter is that we are mostly a Sun shop here and the DecStations have always been a low priority for us so this problem has been ignored. We don't really care if these machines all crash terribly without a backup, as I've said the 3100's are loaners and if I had my way we'll send the two 2100's back for some vt1300 X-terminals. > If it does PLEASE, PLEASE submit an SPR. Bugs like > this should have disappeared ages ago. How exactly does one submit an SPR? Better yet, please call me at (805)893-2896 and I'll be happy to discuss the problem with you directly (or maybe trade in options?). ------------------------------------------------------------------------- Jim Davidson jimbo@Nsfitp.ITP.UCSB.Edu Institute for Theoretical Physics jimbo@sbitp.bitnet University of California at Santa Barbara -- -------------------------------------------------------------------- Jim Davidson jimbo@sbitp.bitnet Institute for Theoretical Physics jimbo@sbitp.ucsb.edu UC Santa Barbara