dbfunk@ICAEN.UIOWA.EDU (David B. Funk) (01/25/89)
> pabong@gonzo.eta.com writes: > has anyone ever seen the error "Unable to obtain sfcb hash table mutex lock" The sfcb hash table is a table, in global memory, that holds shared file control blocks. Type managers that support multiple I/O streams to an object use sfcb's for each open object. This includes most files, IPC sockets, mbx mailboxes, pipes, etc (IE most streams I/O managers). These are used to manage concurrent stream access. Each time a stream is opened or closed to oen of these types of objects, the sfcb hash table is searched to see if an entry needs to be allocated/dealocated or reused. (IE this table gets used a lot and its in global memory where all processes and type managers can get to it.) It has a mutual exclusion lock (mutex lock) on it to prevent corruption from simultanious updates. If this lock gets lost, all kinds of chaos can result. Ordinarily a process (type manager) obtains the lock, does the updating, and releases the lock. If the process faults, its clean-up handlers should release all aquired resources. If a process is blasted or dies a violent enough death that its stack is wiped out, then its clean-up handlers may not get a chance to do their work. This can result in a lost mutex lock. A bug in the streams library (/lib/streams) or a type manager (/sys/mgrs/*) could also cause this problem. Different pieces of system software have diffent revision levels and are depenten upon other pieces being compatabile. EG the tcp/ip upgrade was dependent upon the correct revision of the streams library for correct operation. Mismatched software can cause problems. Third party software can try to pull some fancy stunts that may get in trouble. A messup in the sfcb hash table can be a ticking time bomb that won't show up until long after the culprit did its dirty deed. To summarize, when you have sfcb hash table problems: Check for processes not exiting cleanly Check revision levels of system software Check for bug reports on software that you use When in doubt, reboot before things come to a grinding halt. Dave Funk
giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) (01/29/89)
In article <8901250632.AA00797@icaen.uiowa.edu> dbfunk@ICAEN.UIOWA.EDU (David B. Funk) writes: >To summarize, when you have sfcb hash table problems: > Check for processes not exiting cleanly > Check revision levels of system software > Check for bug reports on software that you use > >When in doubt, reboot before things come to a grinding halt. Very impressive Dave! The details are much more than I know about the system. But I do want to be sure to say that I believe the summary to be accurate. I have not seen it spelled out specifically that if you BLAST process with "sigp -b", "lo -f", or some other method that send a BLAST to a process, you are running on borrowed time. A reboot quite likely will be necessary. A blast will cause a process to not exit cleanly which, as Dave points out, is one of the things to watch for. I have not seen problems with the DN4000 machines having hash table problems yet, but if you believe that things are not getting blasted, you don't have revision mixes, and your in-house and third party software does not have related bugs, please do call it in to the hot line 800 number (provided of course that you have a service contract) or file an APR with crucr or mkapr. -- UUCP: uunet!hi-csc!giebelhaus UUCP: tim@apollo.uucp ARPA: hi-csc!giebelhaus@umn-cs.arpa ARPA: tim@apollo.com Tim Giebelhaus, Apollo Computer, Regional Software Support Specialist. My comments and opinions have nothing to do with work.
jwright@atanasoff.cs.iastate.edu (Jim Wright) (01/30/89)
In article <4123bc79.1032a@hi-csc.UUCP> giebelhaus@hi-csc.UUCP (Timothy R. Giebelhaus) writes: >I have not seen problems with the DN4000 machines having hash table >problems yet, My 4000 had this pop up VERY regularly. It seems to have gone away for the time being. Who knows why.