mike@BRL.MIL (Mike Muuss) (11/02/90)
I have been experiencing a problem with the parallel application "RT" on the SGI for some time now. The application goes back and forth between parallel operation and serial operation many times. The application reads a command script from stdin. I use sproc() to fire off a bunch of "worker" threads when they are needed. Once all the workers finish, and wait() has collected their status, back in the "parent thread", I discover that STDIO has "lost it's place" on buffered file structure "stdin". There seem to be a pair of bugs that cause this: 1) Somehow, when a thread exist, it seems to wend it's way to _cleanup(), which calls _fflush() on all buffered files (if sproc() has been used), or calls _fclose() on all buffered files (if sproc() has NOT been used). 2) fflush(stdin) seems to trounce on stdin. Here is what my application says, every time it finishes parallel processing: Beam: radius=0 mm, divergence=0.00625 mm/1mm DANGER DANGER: stdin file pointer has been corrupted! Attempting to restore it originally position was x398, now position is x2000! FILE structure 'saved stdin', at x7fffc318: _cnt = x1c68 _ptr = x100393a0 _base = x10039008 _file = x0 _flag =01 <_IOREAD> FILE structure 'current stdin', at x10023154: _cnt = x0 _ptr = x100393a0 _base = x10039008 _file = x0 _flag =01 <_IOREAD> It was fixed with a struct copy Here is the offending code at the top of SGI's fflush: if (!(iop->_flag & _IOWRT)) { iop->_cnt = 0; return(0); } /* Followed by the _IOWRT code */ } Here is how the Berkeley version of fflush starts out: if ((iop->_flag&(_IONBF|_IOWRT))==_IOWRT So, on a Berkeley UNIX system, fflush(stdin) is a No-op, but on the SGI, it smashes stdin's _cnt variable. I don't know if the difference has it's origins in System V, or MIPs, or what, and I don't want to know. Ordinarily, I would not care how fflush(stdin) behaved, since I wouldn't expect anybody to want to do that, but since the threads are wending their way into _cleanup, this is a problem. Doing a little sleuthing shows that the child thread created by sproc(), when it returns (to sproc) jumps to exit, viz: (sys/sproc.s) jal s0 # now, jump to real entry move a0,zero jal exit # call exit if return This behavior is partly mentioned in the man page, but the fflush() damage effect is not mentioned exit terminates the calling process with the following consequences: All of the file descriptors open in the calling process are closed. If the process is sharing file descriptors via an sproc, other members of the share group do NOT have their file descriptors closed. until The C function exit causes all file streams to be closed unless one has done an sproc which causes the file streams to simply be flushed. The function _exit circumvents all cleanup. Naturally, the function exit() is roughly: exit(code) { /* Special cleanups here: profiling, semaphores, etc */ _cleanup(); _exit(code); } and the code in _cleanup() for exiting a process behaves like this: _cleanup() { for(iop = _iob; iop < _lastbuf; iop++) { if ( _sproced ) (void) fflush(iop); else (void) fclose(iop); } } This completes the chain. sproc() creates a thread, which returns, and jumps to exit(0), which calls _cleanup(), which calls fflush(stdin), which for some reason does stdin->_cnt = 0. Thus, my application looses it's place in the input file. RECOMMENDATION #1 I believe that the fundamental problem here is that when an sproc() thread returns, it jumps to exit(0). I would suggest instead that when a thread returns, it is given special treatment. No _cleanup(). This would result in new code in sproc.s something like: jal s0 # now, jump to real entry jal _sproc_thread_exit # Don't anger Mike and a new libc routine that parallels exit() in cuexit.c: _sproc_thread_exit() { /* Special cleanups here: profiling, semaphores, etc */ _exit(0); } Plus, the obvious hack in _cleanup() (_sproced) could be eliminated. When the parent thread exits (either through calling exit() or through returning from main()), the normal exit() handler will do it's thing. DISCUSSION I suspect that the reason for SGI's current implementation was so that if a child thread wanted to use exit() to terminate that thread, rather than having to return up out of the top of the subroutine heirarchy from whence it came. This can be especially useful for handling error exits. The problem here is the "overloading" of the library routine exit(), which SGI defines to mean "terminate this thread, and if it is the main thread, terminate the process with normal C-language (e.g. STDIO) completion". This feels natural, and follows along with the the whole SGI strategy for making threads look a lot like UNIX processes, with just a bit of extra sharing. My main problem comes from the termination & buffer flushing implications. Even if things are changed so that fflush does not smash STDIN, the issue runs deeper. Threads are supposed to be reasonably "lightweight". If having a thread in a parallel program terminate results in every STDIO buffer being flushed, this could have serious I/O performance implications. Here I assume that the application writes output as it goes, and uses threads in a very dynamic way. RECOMMENDATION #2: I now present an alternative way of coping with this problem that may have a more appealing interface and semantics. If we assume that a subroutine or system-call can be created to determine if the current thread is the last thread in the multi-processing program, then the following simple variation on exit() should suffice: exit(code) { if( SYS_am_I_the_last_thread() ) { /** This part performed for last thread in program ONLY ***/ /* Special cleanups here: profiling, semaphores, etc */ _cleanup(); } /* Perhaps signal or FPE handling might go here */ _exit(code); } Even if SYS_am_I_the_last_thread() needs to be a new system call, it won't be a performance problem, because it will be called a maximum of once per thread lifetime. (And it is cheaper the doing bunches of fflush()es) SUMMARY A serious problem exists in Irix 3.3.1 with regard to damaging of STDIO buffered I/O in a multi-thread parallel program. The origin of the problem has been identified, and two strategies for correcting the problem have been proposed. I would very much appreciate it if SGI would give this issue detailed consideration. If there is any need for our system numbers to verify that software maintenance has been paid for, please phone Paul Stay at 301-278-9201. Paul Stay and Chuck Kennedy will be flying out to the factory next week, and this topic is a main agenda item for them. I have tried to explain the issues in enough detail here so that a resolution can be swiftly achieved. (I also hope you appreciate the tactful use of pseudo-source code to illustrate the details of this complex issue, withough revealing any secrets). Best, -Mike Muuss Ballistic Research Lab