[net.unix-wizards] UNIX continues!

trb@floyd.UUCP (07/01/83)

I made an amazing discovery the other week, and I confirmed it a few
minutes ago.  It's possible to CONTINUE 4.1bsd UNIX sometimes after it
HALTs.

About a month ago, my 780 hung for no evident reason, it was not
echoing characters, it was just cycling with the little green light
on.  I was destined to HALT it and then reboot it and fsck it, which
would have taken 15-20 minutes and lost a bunch of work, so I figured
that nothing horrible could happen if I HALTed it and tried to CONTINUE
it.  So I hit ^P and got a >>> prompt, typed HALT, got a HALT message
and another >>>, then I typed CON and floyd came right back, hot to
trot, processes still running, etc.  No fsck, no nothin!

I told this to my esteemed colleague harpo!ber and he told me to go
play in traffic.  Well, ber isn't here today, and harpo HALTed without
a UNIX message for some unknown reason.  Debbie Manning, harpo's
operator, was pretty upset, and I said hold on.  I typed CON to the >>>
and harpo came right back to life!  Enough to make you find religion.

Anyway, I know that CONTINUING after HALTs isn't the solution to OS
bugs, but I figure that if I can CONTINUE after a HALT which offers no
other evidence, and if doesn't HALT again, then it's a pretty good
idea.  I'd appreciate it if other people would share their impressions
and experiences on this matter.

	Andy Tannenbaum   Bell Labs  Whippany, NJ   (201) 386-6491

chris@umcp-cs.UUCP (07/03/83)

What is this, don't you people read your little Vax handbooks???  I
thought everyone knew about the 'CONTINUE' console command.  You can
halt a running 780 and then continue it with no problems (except maybe
lost interrupts).  With 750's just hitting ^P halts.  Also, if you
accidently hit ^P and want to un-console-mode the console, type

	SE T P		(SET TERMINAL PROGRAM for you long-words types)

at the >>> prompt.

					- Chris
-- 
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs
ARPA:	chris.umcp-cs@UDel-Relay

cfv@packet.UUCP (07/03/83)

I don't know about you, but I would be scared out of my gourd to try to
CONtinue the OS. If Unix stops, it probably has a good reason (even if it
doesn't deem to tell you) and my feeling is that 15-20 minutes for a reboot
is a small price to pay for a machine and a kernel you can (somewhat) trust.
-- 
>From the dungeons of the Warlock:
					      Chuck Von Rospach
					      ucbvax!amd70!packet!cfv
					      (chuqui@mit-mc)  <- obsolete!

joe@cvl.UUCP (07/03/83)

Our 780, fortunately, has never exhibited any such behaviour.  Our old
11/45, however, used to halt because of bus problems.  Continuing it
was invariably a bad idea, because it halted instead of executing some
instruction that it failed to fetch properly.

Since your 780 was running when you halted it, it would seem that what
you did shouldn't have had any effect at all.  What does halting and
continuing do to the hardware state of the machine that DEC hasn't told
us?

moss%brl-vld@sri-unix.UUCP (07/05/83)

From:      Gary S. Moss (301)278-6647 <moss@brl-vld>

Andy -
	Our 780 running 4.1c exhibits the same behavior that you
described, the running light is on but there is no other sign of
life, and nothing is ever printed on the console when this occurs.
I haven't tried to CONTINUE after the HALT, but it sounds like real
fun, especially compared to trying to recover your lost work from
memory.

Anyone recognize the symptoms, Andy's treatment or possible side-
effects?

- Moss.

jbray%bbn-unix@sri-unix.UUCP (07/05/83)

From:  James Bray <jbray@bbn-unix>

I have never worked on a VAX, but would venture a guess as to what's happening.
I  assume that the HALT/CONTINUE sequence on a VAX leaves the PC unchanged when
resuming; it would be interesting to know what happens regarding interrupts, as
it  sounds  like  your  machine  is hanging on a tight loop at interrupt level,
probably waiting for something to happen that isn't happening, and when you  do
the  H/C,  it either allows other interrupts to break in, thus breaking you out
of the loop, or causes whatever was being awaited to actually occur, or  appear
to.  I  would  suggest  that  once  you have halted, you record such context as
processor status and PC, and then adb your kernel and see where it is  sitting.
You may well see it in a disk-driver; another amusing possibilility would be if
the system machine managed to idle itself at interrupt level...
-- Jim Bray

berry@fortune.UUCP (07/07/83)

#R:floyd:-171200:fortune:11600024:000:421
fortune!berry    Jul  6 12:41:00 1983

	Re: Continuing Vaxen

	Our 11/70 running 2.x BSD used to get hung up occasionally and
we could get it back by burping the machine(halting and continuing).
Evidently interrupts getting caught in tight loops can be broken this
way.  We never found out exactly what it was and tried to bring the
system down nicely as soon as possible.  After all, I would consider a
partial sync to be better than no sync at all, maybe...

msc@qubix.UUCP (07/08/83)

When it was first installed our vax 750 would drop into the console
monitor without printing any message.  It turned out to be caused by
a bad #1 memory card.  It does not seem wise to continue unless
you know why the system crashed.  In this particular case some
part of memory would have been bad had I not done a complete reboot.

I hope that the recently posted memory error trap fixes will give
us messages if the same thing happens in the future.
-- 
	Mark
	...{decvax,ucbvax}!decwrl!qubix!msc
	...{ittvax,amd70}!qubix!msc
	decwrl!qubix!msc@Berkeley.ARPA

jab%berkeley@nwuxd.UUCP (07/09/83)

>From what I remember reading (QUITE a while back), HALTing the CPU isn't
a really great idea.

On the pdp11's, the UNIBUS behaves differently when the CPU is halted. More
specifically, NPR requests fail, and so (older) disks can do unpredictable
things if a transfer is occurring when the CPU is halted. (Really, decvax!aps
is one of the people who come to mind on this. Armando?)

As I said, it's been a while since I've looked at the material on this, but
you are VERY, VERY wrong in the statement that "a partial sync() is better
than no sync() at all."

	Jeff BOwles

ron%brl-bmd@sri-unix.UUCP (07/09/83)

From:      Ron Natalie <ron@brl-bmd>

It depends on which 11 you are talking about with regards to NPR's
not occuring when the CPU is not running.

-Ron