[comp.unix.ultrix] DUMP: SIGSEGV

farhad@CS.Stanford.EDU (Farhad Shakeri) (10/04/90)

Hi  We have found a very strange problem on only one of our
3100s (mine) runing ultrix (3.1c).

Dump fails almost immediately after it starts by this error:

  DUMP: SIGSEGV()  ABORTING!
  Illegal instruction (core dumped)

This problem started about 2 weeks ago and we can't figure out
why.  I thought I had a bad binary or corrupted filesystem
but those looked fine when I compared (sum) them to other 3100s
and they looked fine and passed fsck!  

dump failed in all forms of test, even to /dev/null .

anyway if anybody has seen this sort of problem please let me know.

I am going to convert to 4.0 soon but I would like to solve this
mystery.

Also  can this be a hardware problem?

Thanks a lot.

-- 
       +----------------------------------------------------+
      /   Farhad Shakeri       E-Mail:                     /
     /  Stanford University    farhad@Tehran.Stanford.EDU /
    / Computer Science Dept.                             /
   +----------------------------------------------------+

alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/04/90)

In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
}
} [ Dump fails with a segmentation fault. ]
}

	I have two questions back at you.  Are you using the
	'u' flag to update /etc/dumpdates?  Does the file
	/etc/dumpdates really exist?  If the answer to the
	first question is yes and the second no, create one
	and see if the problem goes away.

	If it does PLEASE, PLEASE submit an SPR.  Bugs like
	this should have disappeared ages ago.

}       /   Farhad Shakeri       E-Mail:                     /


-- 
Alan Rollow				alan@nabeth.enet.dec.com

grr@cbmvax.commodore.com (George Robbins) (10/04/90)

In article <1990Oct3.171146.4158@Neon.Stanford.EDU> farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
> 
> Hi  We have found a very strange problem on only one of our
> 3100s (mine) runing ultrix (3.1c).
> 
> Dump fails almost immediately after it starts by this error:
> 
>   DUMP: SIGSEGV()  ABORTING!
>   Illegal instruction (core dumped)
> 
> This problem started about 2 weeks ago and we can't figure out
> why.  I thought I had a bad binary or corrupted filesystem
> but those looked fine when I compared (sum) them to other 3100s
> and they looked fine and passed fsck!  

When I've seen this kind of problem, it has been due to a copy of the
image in the swap area getting corrupted.  If you reboot the machine,
and then try the dump again immediately, do you get the same problem?

Have you checked the error log for disk errors?  Are you running out
of swap space?  Are you doing anything different on  this machine
than the others, slip, NFS, DECNET?

> Also  can this be a hardware problem?

Could be...

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

farhad@CS.Stanford.EDU (Farhad Shakeri) (10/05/90)

In article <1753@shodha.enet.dec.com>, alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes:
|> In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
|> }
|> } [ Dump fails with a segmentation fault. ]
|> }
|> 
|> 	I have two questions back at you.  Are you using the
|> 	'u' flag to update /etc/dumpdates?  Does the file
|> 	/etc/dumpdates really exist?  If the answer to the
|> 	first question is yes and the second no, create one
|> 	and see if the problem goes away.
|> 
|> 	If it does PLEASE, PLEASE submit an SPR.  Bugs like
|> 	this should have disappeared ages ago.
|> 
|> }       /   Farhad Shakeri       E-Mail:                     /
|> 
|> 
|> -- 
|> Alan Rollow				alan@nabeth.enet.dec.com


dump failed in all cases with or without 'u' .

I will submit an SPR, if it helps.


-- 
       +----------------------------------------------------+
      /   Farhad Shakeri       E-Mail:                     /
     /  Stanford University    farhad@Tehran.Stanford.EDU /
    / Computer Science Dept.                             /
   +----------------------------------------------------+

farhad@CS.Stanford.EDU (Farhad Shakeri) (10/05/90)

In article <14862@cbmvax.commodore.com>, grr@cbmvax.commodore.com (George Robbins) writes:
|> In article <1990Oct3.171146.4158@Neon.Stanford.EDU> farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
|> 
|> When I've seen this kind of problem, it has been due to a copy of the
|> image in the swap area getting corrupted.  If you reboot the machine,
|> and then try the dump again immediately, do you get the same problem?

YES!  I have done everything I can think of.  fsck, power on/off
no errors in errlog file.  checked the dumpdates file...

|> ...
|> > Also  can this be a hardware problem?
|> 
|> Could be...
|> 
|> -- 
|> George Robbins - 
-- 
       +----------------------------------------------------+
      /   Farhad Shakeri       E-Mail:                     /
     /  Stanford University    farhad@Tehran.Stanford.EDU /
    / Computer Science Dept.                             /
   +----------------------------------------------------+

alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) (10/05/90)

In article <1990Oct5.001625.11355@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
> In article <1753@shodha.enet.dec.com>, alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes:
> |> In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
> |> }
> |> } [ Dump fails with a segmentation fault. ]
> |> }
> |> 
> |> [ I suggest that it might a lack of the /etc/dumpdates file. ]
> 
> dump failed in all cases with or without 'u' .
>

	Not that bug, oh well.  Try this one.  Take a very close
	look at /etc/fstab.  Particularly the 2nd field of each
	line.  Are all the path names properly formed?  Do they
	all begin with '/'?  How about the rest of the file?  Any
	missing fields?  It seems that dump is very intolerent
	of a bad /etc/fstab.

	This too is a bug.  If it turns out to be this, then please
	submit an SPR on it.  While you're there you should also
	mention that a successful dump returns an exit status of
	one (1) instead of zero (0) like most programs.  I also
	consider this one a bug.  Maybe it will get fixed in the
	OSF/1 based system...
> -- 
>       /   Farhad Shakeri       E-Mail:                     /


-- 
Alan Rollow				alan@nabeth.enet.dec.com

farhad@CS.Stanford.EDU (Farhad Shakeri) (10/06/90)

In article alan@shodha.enet.dec.com writes:
|>
|> Alan Rollow				alan@nabeth.enet.dec.com

The problem is fixed.  It was a bad binary that had gone bad or something!?!

anyway  I took the binary from another 3100 instead of my own
old backups and it is dumping!

very strange I took the binaries from a 5400 and it failed (same OS)

maybe I have some bad spots on my disk!

anyway thanks for your suggestions.

-- 
       +----------------------------------------------------+
      /   Farhad Shakeri       E-Mail:                     /
     /  Stanford University    farhad@Tehran.Stanford.EDU /
    / Computer Science Dept.                             /
   +----------------------------------------------------+

6600jimi@ucsbuxa.ucsb.edu (Jim Davidson) (10/06/90)

In article <1753@shodha.enet.dec.com> alan@shodha.enet.dec.com ( Alan's Home for Wayward Notes File.) writes:



Yes!  We have the exact same problem on 2 3100's and 2 2100's!  I
thought we were alone!


>In article <1990Oct3.171146.4158@Neon.Stanford.EDU>, farhad@CS.Stanford.EDU (Farhad Shakeri) writes:
>}
>} [ Dump fails with a segmentation fault. ]
>}

>	I have two questions back at you.  Are you using the
>	'u' flag to update /etc/dumpdates?  Does the file
>	/etc/dumpdates really exist?  If the answer to the
>	first question is yes and the second no, create one
>	and see if the problem goes away.

I can answer this:  It makes ABSOLUTELY NO DIFFERENCE!  In fact,
if the /etc/dumpdates file exists and is not empty, doing

% dump w

to list file systems to dump will give "Segmentation fault (core
dumped)" This is a long standing problem I have had and Dec has never been
able to solve it.  Just today I showed this annoyance to a
hardware guy from Dec (here for another reason) and he thinks it's
software.  However, I was on the phone for hours with Atlanta 
a few months back to no avail.  My attempted solution was to simply
tar off what I needed and install straight from the book Ultrix 4.0.
This changed nothing!  Be assured, your dump image is fine- you
can call Atlanta and compare the output of

% sum /usr/bin/dump

with your machine and a machine at the hands of the Dec support
person.  I'm positive they'll match.  The very strange thing
is that it seems to be a bug that spreads- it started on
a 3100 we had on loan, spread to a second loaner 3100, and
is now infesting two 2100's we've bought.

Ths truth of the matter is that we are mostly a Sun shop here
and the DecStations have always been a low priority for us so
this problem has been ignored.  We don't really care if these
machines all crash terribly without a backup, as I've said the
3100's are loaners and if I had my way we'll send the two
2100's back for some vt1300 X-terminals.

>	If it does PLEASE, PLEASE submit an SPR.  Bugs like
>	this should have disappeared ages ago.

How exactly does one submit an SPR?  Better yet, please
call me at (805)893-2896 and I'll be happy to discuss the
problem with you directly (or maybe trade in options?).


-------------------------------------------------------------------------
Jim Davidson					jimbo@Nsfitp.ITP.UCSB.Edu
Institute for Theoretical Physics		jimbo@sbitp.bitnet
University of California at Santa Barbara


--
--------------------------------------------------------------------
  Jim Davidson					jimbo@sbitp.bitnet
  Institute for Theoretical Physics		jimbo@sbitp.ucsb.edu
  UC Santa Barbara