[comp.unix.admin] restore crashes remote machine

zook@sweetpea.jsc.nasa.gov (Craig A. Zook 283-4206) (05/11/91)

I have several Suns (4/280, 4/470, 3/60, 3/280) that are equiped with
Exbyte 8mm tape drives.  I am running SunOS 4.1.1 on all of them.
Whenever I attempt to do a restore using a remote tape drive the remote
system crashes.  The command I use is:

restore if tm:/dev/rst1

This command worked fine under SunOS 4.1.  The dump to a remote tape
command still works.

The only indication I get is in the messages file.  Below are the key lines.

May 10 13:41:21 tm vmunix: st1:  tape synchronization lost
May 10 13:41:21 tm vmunix: st1:  file positioning error
May 10 13:41:21 tm vmunix: panic: psig action
May 10 13:41:21 tm vmunix: syncing file systems... [2] [2] done
May 10 13:41:21 tm vmunix: 00280 low-memory static kernel pages
May 10 13:41:21 tm vmunix: 00325 additional static and sysmap kernel pages
May 10 13:41:21 tm vmunix: 00000 dynamic kernel data pages
May 10 13:41:21 tm vmunix: 00102 additional user structure pages
May 10 13:41:21 tm vmunix: 00000 segmap kernel pages
May 10 13:41:21 tm vmunix: 00000 segvn kernel pages
May 10 13:41:21 tm vmunix: 00067 current user process pages
May 10 13:41:21 tm vmunix: 00073 user stack pages
May 10 13:41:21 tm vmunix: 00847 total pages (847 chunks)
May 10 13:41:21 tm vmunix:
May 10 13:41:21 tm vmunix: dumping to vp ff072114, offset 247850
May 10 13:41:21 tm vmunix: SunOS Release 4.1.1 (TM_TOPS) #1: Sat Feb 23
15:37:00
 CST 1991


Can anyone help me?  Thanks in advance.

--
Craig Zook   -   zook@sweetpea.jsc.nasa.gov
Systems Engineeering and Administration
McDonnell Douglas Space Systems Corp. - Engineering Services Division
(713) 283-4206

torek@elf.ee.lbl.gov (Chris Torek) (05/11/91)

In article <1991May10.141543@sweetpea.jsc.nasa.gov>
zook@sweetpea.jsc.nasa.gov (Craig A. Zook  283-4206) writes:
>I have several Suns ... running SunOS 4.1.1 ... [and get]
>system crashes.

>May 10 13:41:21 tm vmunix: panic: psig action

This is an internally-detected error in `psig'.

Signals are sent to a process by setting a bit in the other process's
`pending signals' mask.  The process is awaked, if appropriate, and
when it eventually runs, it notices the pending signal and calls
`issig' and then `psig'.  psig() determines what to do based on the
current `signal action'.  This action is one of:

	SIG_DFL: take the default action (often, core dump/kill process).
	SIG_IGN: ignore.
	SIG_CATCH, SIG_HOLD: (internal to psignal/issig)

Now, issig() and psignal() are supposed to discard signals with no
effect, and say `no pending signal' for signals that are blocked.
Thus, psig() will panic if either:

	u.u_signal[p->p_cursig] == SIG_IGN

or

	(p->p_sigmask & (1 << p->p_cursig)) != 0

because both of these mean `the signal is not supposed to do anything'
---for SIG_IGN, `not ever' and for p_sigmask, `not yet'.

There must therefore be a bug in either psignal or issig, or both.
Presumably you are paying Sun for support.  Well, here they go....
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

mills@ccu.umanitoba.ca (Gary Mills) (05/13/91)

In <13091@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:

>In article <1991May10.141543@sweetpea.jsc.nasa.gov>
>zook@sweetpea.jsc.nasa.gov (Craig A. Zook  283-4206) writes:

>>May 10 13:41:21 tm vmunix: panic: psig action

>This is an internally-detected error in `psig'.

>Thus, psig() will panic if either:

>	u.u_signal[p->p_cursig] == SIG_IGN

>or

>	(p->p_sigmask & (1 << p->p_cursig)) != 0

>because both of these mean `the signal is not supposed to do anything'
>---for SIG_IGN, `not ever' and for p_sigmask, `not yet'.

>There must therefore be a bug in either psignal or issig, or both.
>Presumably you are paying Sun for support.  Well, here they go....

I just obtained Sun's patch for this (patch 100288-02).  It consists
of a new copy of kern_sig.o.  When I run strings on the old and new
copies, the only addition is:

psig: "%s" signal %d was masked, put back.

It looks a bit odd to me.  I assume it fixes the bug.
-- 
-Gary Mills-         -Networking Group-          -U of M Computer Services-

brossard@sic.epfl.ch (Alain Brossard EPFL-SIC/SII) (05/25/91)

In article <1991May13.151730.3701@ccu.umanitoba.ca>, mills@ccu.umanitoba.ca (Gary Mills) writes:
|> 
|> I just obtained Sun's patch for this (patch 100288-02).  It consists
|> of a new copy of kern_sig.o.  When I run strings on the old and new
|> copies, the only addition is:
|> 
|> It looks a bit odd to me.  I assume it fixes the bug.

   The fact that there is one more string doesn't mean that
the code wasn't changed elsewhere!  The patch does work, but I
think it creates/reveals a bug with TFS.  I haven't tried this since,
but when we installed the patch and the user tried its restore trick,
my machine didn't crash anymore BUT my tfs mounts where all
screwed ups and I had to reboot anyway!  This occured twice and
we have refrained from using the t flag to restore since.
-- 

Alain Brossard, Ecole Polytechnique Federale de Lausanne,
	SIC/SII, EL-Ecublens, CH-1015 Lausanne, Suisse
brossard@sasun1.epfl.ch