fouts@bozeman.ingr.com (Martin Fouts) (06/27/90)
In article <9360@hubcap.clemson.edu> richk@tera.com (Richard Korry) writes:
I am interested how parallel Unix systems (Sequent, Convex, etc.) have
defined the semantics of Unix signals in a parallel environment.
Poorly. (;-)
For example, the man pages claim a signal "interrupts" executing system call.
Does this map straight onto a parallel program, i.e. X threads running on
Y processors are stopped when a signal arrives? How do they deal with user
level signal handlers doing longjumps? Forgive me if this has already been
dealt with in this news group.
Extending signal handling into the parallel environment leads to a
number of opportunities for reasonable men to disagree. One of the
major ongoing debates in the POSIX 1003.4 attempt to standardize
threads has been about signal handling. I've missed the last few
meetings, so I don't know which direction 1003.4 is going, but a quick
flawed overview of the issues include:
1) Visibility of a thread to the signal handling:
How do I deliver a signal to a specific thread? (Do I?)
2) Choice of thread to receive a signal sent to the process as a whole
3) The whole area of blocking/nonblocking i/o.
Philosophically, you can take the approach of extension by least
astounishment and try to extend signal handling to multithreaded
applications, or you can take the approach of inappropriate behavior
and simply claim that signals to a thread don't make sense.
I'm currently leaning towards the later personally, as signals were
never meant as an IPC mechanism in the first place. An appropriate
variation would be to not allow signal handlers within multithreaded
applications: signals would either be ignored or cause termination,
but could not be caught. In this environment, exceptions would be
used where signals are currently used to indicate an exception, such
as FPE.
Marty
--
Martin Fouts
UUCP: ...!pyramid!garth!fouts ARPA: apd!fouts@ingr.com
PHONE: (415) 852-2310 FAX: (415) 856-9224
MAIL: 2400 Geng Road, Palo Alto, CA, 94303
If you can find an opinion in my posting, please let me know.
I don't have opinions, only misconceptions.
keith@Stardent.COM (06/30/90)
In article <9482@hubcap.clemson.edu> fouts@bozeman.ingr.com (Martin Fouts) writes: >In article <9360@hubcap.clemson.edu> richk@tera.com (Richard Korry) writes: > > I am interested how parallel Unix systems (Sequent, Convex, etc.) have > defined the semantics of Unix signals in a parallel environment. > >Poorly. (;-) At Stellar (the east coast half of Stardent) our implementation of signals has proven to be very useful. The machine is a 4 processor shared memory architecture. The languages we support are C and Fortran, with Fortran able to call libc signal routines. The os is based on system V.3. The result is that you can take a "dusty deck" C program and compile it with automatic parallelization and, with one obscure exception that we have never encountered, have your signal handlers and longjmps work right - or at least work as well as they would singly threaded. In addition, if you use compiler directives or do you own explicit parallel programming you can still use signals and longjmp providing certain simple restrictions are taken into consideration. Here is an overview of the implementation: 1) Calls to signal affect all threads, regardless of which thread makes the call. 2) Arrival of a handled signal causes all threads to stop what they are doing and to spin in a conventional place in libc in the user's address space. 3) One thread is designated the causing thread by the kernel. In the case of synchronous interrupts, such as floating point exceptions, it will be the thread that took the exception. In the case of other interrupts, such as SIGINT, it is usually the main (initial) thread, but it could be any one. 4) Only the causing thread executes the signal handler. 5) The signal handler can terminate by returning, exiting or longjmping. If it returns then all threads are sent back to what they were doing when the interrupt occurred. If it exits then the process terminates. The only interesting case is longjmping. 6) In the longjmp case, the thread that actually does the longjmp must be the one that did the setjmp. This does not have to be the one executing the signal handler. Thus the causing thread calls longjmp from the signal handler but another thread may end up doing the longjmp. 7) The longjmp code clears any locks held by the runtime system (see below), causes the longjmping thread to do the longjmp and the other threads to go back to the "idle loop", which is where threads wait for parallel work opportunities. In practice, the main thread of the program must do the longjmp out of a signal handler. There is some cleanup that must be done by longjmp. Since we have a parallel libc (multiple threads can do system calls, use standard IO, etc.) we have implemented a locking strategy in the library. Thus locks may be held by various threads when the signal arrives. These have to be cleared before the longjmp completes to prevent the possibility of deadlock. Actually, they are temporarily disabled for the duration of the signal handler in case the handler calls library routines. (This type of call can cause other problems but they exist even in a singly threaded process that calls random library functions from signal handlers. We did not attempt to solve the general problem of the lack of exception handling in C/Unix.) If user code contains its own locks it is up to the signal handler to deal with them correctly before longjmping. It is still possible to do longjmps from outside a signal handler. In that case the thread that does the setjmp and the thread that calls longjmp must be the same thread and the other threads are not involved. Certain restrictions must be observed about jumping across parallel code (i.e. stack frames where there is currently parallel code being executed). In practice these restrictions have not presented a problem. There are some more details about the implementation that have to do with the compiler's concurrency runtime model, the hardware support for con- currency, and some grubby stuff in the library. I wrote this from memory so I may have gotten some details a little wrong but it is all described in agonizing detail in our documentation.