wbrown@beva.bev.lbl.gov (Bill Brown) (10/13/89)
We're bringing up a control system that uses vxWorks and have run into a bit of a problem. Our understanding of what is going on isn't very good at the moment, so I'd like to know if anyone has seen a similar type of failure. Our control system is split up into several tasks that do the actual control work, along with a couple of tasks that log errors and history data to a couple of disk files on the server, via "open", "logFdAdd", and "logMsg" calls in the case of error messages, and "open", "fprintf", and "close" calls in the case of the history file. We thought that this would allow the control tasks to proceed even if something went wrong with the disk i/o tasks. The processor is a Motorola MVME-147, running vxWorks V4.0. It is the only processor in the VME bin. When the NFS server goes down (it really shouldn't, but it does) the vxWorks based system hangs ( no task switching - console is dead - no detectable activity anywhere ) apparently on an attempt to write to a file. From examining indicators on the i/o hardware we can tell that the control tasks are not executing. It's like the networking task is in a wait-loop with the interrupts turned off. Has anyone run into this sort of thing? Any ideas about where we went wrong? I'd hate to think that we can't log data with our control system because a failure of the NFS server will hang up the whole shootin' match. We've implimented a dead-man timer that uses the '147s built-in hardware dead-man timer, but it doesn't reset the system. I don't understand how that can fail, unless something inside vxWorks is messing with its' control register, but that's a seperate issue. If it did reset the bin the system couldn't boot because at the moment we're using the same system for both our boot server and our file server. If we ever solve the system-hang problem we can mount our output files on a different host. thanks for you ideas, -bill wlbrown@lbl.gov