[comp.realtime] rebooting hung vxWorks system

martin@aster.UCAR.EDU (Charlie Martin) (11/18/89)

Has anybody devised a method for remotely booting 
(ie. over the network) a vxWorks system which has crashed?

sbrandt@herds.SRC.Honeywell.COM (Scott Brandt) (12/06/89)

In article <5321@ncar.ucar.edu> martin@aster.UCAR.EDU (Charlie Martin) writes:
>Has anybody devised a method for remotely booting 
>(ie. over the network) a vxWorks system which has crashed?

In answer to your question:
	If the system has crashed, it would probably be impossible to talk to 
it to tell it to reboot.  Therefore you would need some external way to toggle 
the reset line in the chassis.  One of my hardware-hacking coworkers 
connected a device to one of our vxWorks systems that toggles the reset line
if the phone rings more than 8 times in the lab.  This is hardly an elegant
solution, but in order to remotely reboot the system you will need something
analogous to this.  If you are creative you can probably put some sort of 
device on whichever host your vxWorks system boots off of to accomplish 
the same thing.  

Scott Brandt

wbrown@beva.bev.lbl.gov (Bill Brown) (12/06/89)

In article <41645@srcsip.UUCP> sbrandt@src.honeywell.com (Scott Brandt) writes:
>In article <5321@ncar.ucar.edu> martin@aster.UCAR.EDU (Charlie Martin) writes:
>>Has anybody devised a method for remotely booting 
>>(ie. over the network) a vxWorks system which has crashed?
>
>In answer to your question:
>	If the system has crashed, it would probably be impossible to talk to 
>it to tell it to reboot.  Therefore you would need some external way to toggle 
>the reset line in the chassis.  One of my hardware-hacking coworkers 
>connected a device to one of our vxWorks systems that toggles the reset line
>if the phone rings more than 8 times in the lab.  This is hardly an elegant
>solution, but in order to remotely reboot the system you will need something
>analogous to this.  If you are creative you can probably put some sort of 
>device on whichever host your vxWorks system boots off of to accomplish 
>the same thing.  

We've developed a dead-man system that seems to work fairly well.  It's our
intent to release it to the archive when we're sure it's working.  It
uses the built-in deadman timer on the MVME-147 and also is being moved
to the '133 and '135.  The scheme depends on the task-scheduling invoking
a function that keeps the dead-man from timing out.  The problem is that
all 3 boards have different hardware, so most of the code is system-dependent.
Probably we'll add stuff to the vxWorks system-dependent code.

I certainly wouldn't want it to stand between me and the end of the world,
but it does seem to take care of quite a few system hangs.

							-bill
							wlbrown@lbl.gov

Disclaimer:  These opinions are my own and have nothing to do with the
    official policy or management of L.B.L, who probably couldn't care 
    less about employees who play with trains.

biocca@bevb.bev.lbl.gov (Alan Biocca) (12/07/89)

In article <5321@ncar.ucar.edu> martin@aster.UCAR.EDU (Charlie Martin) writes:
>Has anybody devised a method for remotely booting 
>(ie. over the network) a vxWorks system which has crashed?

    This is not quite what you are asking, but may be of use:

    We have had good success using the Deadman timer available on many
    CPU boards.  Essentially a monitor task periodically checks on the
    health of the system (in whatever way you like) and resets the
    deadman timer.  This timer produces a reset when it times out, 
    causing the crate to automatically reboot when crashes occur.

    In the Motorola 147 we are using the deadman timer has a very short
    maximum timeout and we found that a regular task could not reliaby
    reset it often enough.  We added some code to the clock interrupt
    handler to manage a longer timer and reset the hardware deadman
    until the longer software timer expired.

    We are planning to put this in the vxWorks archive soon.

		Alan K Biocca
		Deputy Project Manager
		BEVALAC Controls Group
		Lawrence Berkeley Laboratory