trb@floyd.UUCP (07/01/83)
I made an amazing discovery the other week, and I confirmed it a few minutes ago. It's possible to CONTINUE 4.1bsd UNIX sometimes after it HALTs. About a month ago, my 780 hung for no evident reason, it was not echoing characters, it was just cycling with the little green light on. I was destined to HALT it and then reboot it and fsck it, which would have taken 15-20 minutes and lost a bunch of work, so I figured that nothing horrible could happen if I HALTed it and tried to CONTINUE it. So I hit ^P and got a >>> prompt, typed HALT, got a HALT message and another >>>, then I typed CON and floyd came right back, hot to trot, processes still running, etc. No fsck, no nothin! I told this to my esteemed colleague harpo!ber and he told me to go play in traffic. Well, ber isn't here today, and harpo HALTed without a UNIX message for some unknown reason. Debbie Manning, harpo's operator, was pretty upset, and I said hold on. I typed CON to the >>> and harpo came right back to life! Enough to make you find religion. Anyway, I know that CONTINUING after HALTs isn't the solution to OS bugs, but I figure that if I can CONTINUE after a HALT which offers no other evidence, and if doesn't HALT again, then it's a pretty good idea. I'd appreciate it if other people would share their impressions and experiences on this matter. Andy Tannenbaum Bell Labs Whippany, NJ (201) 386-6491
chris@umcp-cs.UUCP (07/03/83)
What is this, don't you people read your little Vax handbooks??? I thought everyone knew about the 'CONTINUE' console command. You can halt a running 780 and then continue it with no problems (except maybe lost interrupts). With 750's just hitting ^P halts. Also, if you accidently hit ^P and want to un-console-mode the console, type SE T P (SET TERMINAL PROGRAM for you long-words types) at the >>> prompt. - Chris -- UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris.umcp-cs@UDel-Relay
cfv@packet.UUCP (07/03/83)
I don't know about you, but I would be scared out of my gourd to try to
CONtinue the OS. If Unix stops, it probably has a good reason (even if it
doesn't deem to tell you) and my feeling is that 15-20 minutes for a reboot
is a small price to pay for a machine and a kernel you can (somewhat) trust.
--
>From the dungeons of the Warlock:
Chuck Von Rospach
ucbvax!amd70!packet!cfv
(chuqui@mit-mc) <- obsolete!
joe@cvl.UUCP (07/03/83)
Our 780, fortunately, has never exhibited any such behaviour. Our old 11/45, however, used to halt because of bus problems. Continuing it was invariably a bad idea, because it halted instead of executing some instruction that it failed to fetch properly. Since your 780 was running when you halted it, it would seem that what you did shouldn't have had any effect at all. What does halting and continuing do to the hardware state of the machine that DEC hasn't told us?
moss%brl-vld@sri-unix.UUCP (07/05/83)
From: Gary S. Moss (301)278-6647 <moss@brl-vld> Andy - Our 780 running 4.1c exhibits the same behavior that you described, the running light is on but there is no other sign of life, and nothing is ever printed on the console when this occurs. I haven't tried to CONTINUE after the HALT, but it sounds like real fun, especially compared to trying to recover your lost work from memory. Anyone recognize the symptoms, Andy's treatment or possible side- effects? - Moss.
jbray%bbn-unix@sri-unix.UUCP (07/05/83)
From: James Bray <jbray@bbn-unix> I have never worked on a VAX, but would venture a guess as to what's happening. I assume that the HALT/CONTINUE sequence on a VAX leaves the PC unchanged when resuming; it would be interesting to know what happens regarding interrupts, as it sounds like your machine is hanging on a tight loop at interrupt level, probably waiting for something to happen that isn't happening, and when you do the H/C, it either allows other interrupts to break in, thus breaking you out of the loop, or causes whatever was being awaited to actually occur, or appear to. I would suggest that once you have halted, you record such context as processor status and PC, and then adb your kernel and see where it is sitting. You may well see it in a disk-driver; another amusing possibilility would be if the system machine managed to idle itself at interrupt level... -- Jim Bray
berry@fortune.UUCP (07/07/83)
#R:floyd:-171200:fortune:11600024:000:421 fortune!berry Jul 6 12:41:00 1983 Re: Continuing Vaxen Our 11/70 running 2.x BSD used to get hung up occasionally and we could get it back by burping the machine(halting and continuing). Evidently interrupts getting caught in tight loops can be broken this way. We never found out exactly what it was and tried to bring the system down nicely as soon as possible. After all, I would consider a partial sync to be better than no sync at all, maybe...
msc@qubix.UUCP (07/08/83)
When it was first installed our vax 750 would drop into the console monitor without printing any message. It turned out to be caused by a bad #1 memory card. It does not seem wise to continue unless you know why the system crashed. In this particular case some part of memory would have been bad had I not done a complete reboot. I hope that the recently posted memory error trap fixes will give us messages if the same thing happens in the future. -- Mark ...{decvax,ucbvax}!decwrl!qubix!msc ...{ittvax,amd70}!qubix!msc decwrl!qubix!msc@Berkeley.ARPA
jab%berkeley@nwuxd.UUCP (07/09/83)
>From what I remember reading (QUITE a while back), HALTing the CPU isn't
a really great idea.
On the pdp11's, the UNIBUS behaves differently when the CPU is halted. More
specifically, NPR requests fail, and so (older) disks can do unpredictable
things if a transfer is occurring when the CPU is halted. (Really, decvax!aps
is one of the people who come to mind on this. Armando?)
As I said, it's been a while since I've looked at the material on this, but
you are VERY, VERY wrong in the statement that "a partial sync() is better
than no sync() at all."
Jeff BOwles
ron%brl-bmd@sri-unix.UUCP (07/09/83)
From: Ron Natalie <ron@brl-bmd> It depends on which 11 you are talking about with regards to NPR's not occuring when the CPU is not running. -Ron