[comp.parallel] Unix Signals and Parallel Programs

fouts@bozeman.ingr.com (Martin Fouts) (06/27/90)

In article <9360@hubcap.clemson.edu> richk@tera.com (Richard Korry) writes:

   I am interested how parallel Unix systems (Sequent, Convex, etc.) have
   defined the semantics of Unix signals in a parallel environment.

Poorly. (;-)

   For example, the man pages claim a signal "interrupts" executing system call.
   Does this map straight onto a parallel program, i.e. X threads running on
   Y processors are stopped when a signal arrives? How do they deal with user
   level signal handlers doing longjumps? Forgive me if this has already been
   dealt with in this news group.

Extending signal handling into the parallel environment leads to a
number of opportunities for reasonable men to disagree.  One of the
major ongoing debates in the POSIX 1003.4 attempt to standardize
threads has been about signal handling.  I've missed the last few
meetings, so I don't know which direction 1003.4 is going, but a quick
flawed overview of the issues include:

1) Visibility of a thread to the signal handling:
   How do I deliver a signal to a specific thread?  (Do I?)

2) Choice of thread to receive a signal sent to the process as a whole

3) The whole area of blocking/nonblocking i/o.

Philosophically, you can take the approach of extension by least
astounishment and try to extend signal handling to multithreaded
applications, or you can take the approach of inappropriate behavior
and simply claim that signals to a thread don't make sense.

I'm currently leaning towards the later personally, as signals were
never meant as an IPC mechanism in the first place.  An appropriate
variation would be to not allow signal handlers within multithreaded
applications:  signals would either be ignored or cause termination,
but could not be caught.  In this environment, exceptions would be
used where signals are currently used to indicate an exception, such
as FPE.

Marty

--
Martin Fouts

 UUCP:  ...!pyramid!garth!fouts  ARPA:  apd!fouts@ingr.com
PHONE:  (415) 852-2310            FAX:  (415) 856-9224
 MAIL:  2400 Geng Road, Palo Alto, CA, 94303

If you can find an opinion in my posting, please let me know.
I don't have opinions, only misconceptions.

keith@Stardent.COM (06/30/90)

In article <9482@hubcap.clemson.edu> fouts@bozeman.ingr.com (Martin Fouts) writes:
>In article <9360@hubcap.clemson.edu> richk@tera.com (Richard Korry) writes:
>
>   I am interested how parallel Unix systems (Sequent, Convex, etc.) have
>   defined the semantics of Unix signals in a parallel environment.
>
>Poorly. (;-)

At Stellar (the east coast half of Stardent) our implementation of signals
has proven to be very useful.  The machine is a 4 processor shared memory
architecture.  The languages we support are C and Fortran, with Fortran
able to call libc signal routines.  The os is based on system V.3.
The result is that you can take a "dusty deck" C program and compile it
with automatic parallelization and, with one obscure exception that
we have never encountered, have your signal handlers and longjmps
work right - or at least work as well as they would singly threaded.
In addition, if you use compiler directives or do you own explicit
parallel programming you can still use signals and longjmp providing
certain simple restrictions are taken into consideration.  Here is an 
overview of the implementation:

1) Calls to signal affect all threads, regardless of which thread makes
the call.

2) Arrival of a handled signal causes all threads to stop what they are
doing and to spin in a conventional place in libc in the user's address
space.

3) One thread is designated the causing thread by the kernel.  In the case
of synchronous interrupts, such as floating point exceptions, it will
be the thread that took the exception.  In the case of other interrupts,
such as SIGINT, it is usually the main (initial) thread, but it could be 
any one.

4) Only the causing thread executes the signal handler.

5) The signal handler can terminate by returning, exiting or longjmping.
If it returns then all threads are sent back to what they were doing when
the interrupt occurred.  If it exits then the process terminates.  The
only interesting case is longjmping.

6) In the longjmp case, the thread that actually does the longjmp must
be the one that did the setjmp.  This does not have to be the one executing
the signal handler.  Thus the causing thread calls longjmp from the signal
handler but another thread may end up doing the longjmp.

7) The longjmp code clears any locks held by the runtime system (see below),
causes the longjmping thread to do the longjmp and the other threads to
go back to the "idle loop", which is where threads wait for parallel
work opportunities.  In practice, the main thread of the program must
do the longjmp out of a signal handler.

There is some cleanup that must be done by longjmp.  Since we have a parallel
libc (multiple threads can do system calls, use standard IO, etc.)
we have implemented a locking strategy in the library.  Thus locks may be
held by various threads when the signal arrives.  These have to be cleared 
before the longjmp completes to prevent the possibility of deadlock.  Actually,
they are temporarily disabled for the duration of the signal handler in
case the handler calls library routines.  (This type of call can cause other 
problems but they exist even in a singly threaded process that calls random
library functions from signal handlers.  We did not attempt to solve
the general problem of the lack of exception handling in C/Unix.)
If user code contains its own locks it is up to the signal handler to
deal with them correctly before longjmping.

It is still possible to do longjmps from outside a signal handler.
In that case the thread that does the setjmp and the thread that calls
longjmp must be the same thread and the other threads are not involved.
Certain restrictions must be observed about jumping across parallel code
(i.e. stack frames where there is currently parallel code being executed).
In practice these restrictions have not presented a problem.

There are some more details about the implementation that have to do
with the compiler's concurrency runtime model, the hardware support for con-
currency, and some grubby stuff in the library.  I wrote this from memory
so I may have gotten some details a little wrong but it is all described
in agonizing detail in our documentation.