[comp.unix.wizards] Signals and context switches

drs@bnlux1.bnl.gov (David R. Stampf) (06/16/91)

	I just wrote two short programs.

	program 1 sends a SIGUSR1 signal to program 2, then waits to
	receive a SIGUSR1 signal from program 2. It repeats this 100
	times.

	program 2 just catches SIGUSR1 signals from program 1 and sends
	a SIGUSR1 signal back.

	What surprises me is that it takes 70 seconds to send the 100
	signals back and forth! 

	Is it possible to get better timing on this? I guess what I'd
	like to do is to force a context switch when I send the signal
	and get much better "turn" around.

	Any help is appreciated.

	< dave stampf
 
	

bhoughto@pima.intel.com (Blair P. Houghton) (06/17/91)

In article <1991Jun16.010626.28257@bnlux1.bnl.gov> drs@bnlux1.bnl.gov (David R. Stampf) writes:
>	program 1 sends a SIGUSR1 signal to program 2, then waits to
>	receive a SIGUSR1 signal from program 2.
>	program 2 just catches SIGUSR1 signals from program 1 and sends
>	a SIGUSR1 signal back.
>	What surprises me is that it takes 70 seconds to send the 100
>	signals back and forth! 

Makes sense.  Going into a signal-wait sends the process to sleep,
and the kernel goes and gives every other process of equal or
higher priority on the machine a chance to run before it gets
around to telling your process it has a signal waiting (provided
a signal was sent to it).

This is not a lot of overhead if you're planning on using this
signalling framework to control a larger piece of code that
will do considerable computing before it goes to sleep again,
and especially if there's any i/o involved.

BTW, you're sending 200 signals, not 100.

Then again, you could just have a really bletcherous kernel.
What sort of system is it?

And how do you manage to tell both processes the PID of each
other?  Some sort of prompt?  Or pass it in a file?

				--Blair
				  "Travel from point A to point B
				   making only right turns..."

boyd@prl.dec.com (Boyd Roberts) (06/17/91)

In article <1991Jun16.010626.28257@bnlux1.bnl.gov>, drs@bnlux1.bnl.gov (David R. Stampf) writes:
> 	program 1 sends a SIGUSR1 signal to program 2, then waits to
> 	receive a SIGUSR1 signal from program 2. It repeats this 100
> 	times.
> 
> 	program 2 just catches SIGUSR1 signals from program 1 and sends
> 	a SIGUSR1 signal back.
> 
> 	What surprises me is that it takes 70 seconds to send the 100
> 	signals back and forth! 

Well, it'll depend on the load on your machine and how long the
running processes run for, on average.  Worst case is that you have
N compute bound processes doing no I/O, which translates into 1 second
time quantums.  Your signal catching programs only receive the signal
when they run, and they have to wait till it's their turn.  This could be
a long time, but it could be quick.  It's just not predictable.

Maybe you could do something gross to speed it up, but already I think
it's time for a walk into the hall of mirrors and have a good hard geek
at what you're trying to achieve.  Signals were not designed for inter-process
communication.  They were there to kill processes.


Boyd Roberts			boyd@prl.dec.com

``When the going gets wierd, the weird turn pro...''

drs@bnlux1.bnl.gov (David R. Stampf) (06/17/91)

In article <1991Jun17.085558.16652@prl.dec.com> boyd@prl.dec.com (Boyd Roberts) writes:
>In article <1991Jun16.010626.28257@bnlux1.bnl.gov>, drs@bnlux1.bnl.gov (David R. Stampf) writes:
>> 	program 1 sends a SIGUSR1 signal to program 2, then waits to
>> 	receive a SIGUSR1 signal from program 2. It repeats this 100
>> 	times.
>> 
>> 	program 2 just catches SIGUSR1 signals from program 1 and sends
>> 	a SIGUSR1 signal back.
>> 
>> 	What surprises me is that it takes 70 seconds to send the 100
>> 	signals back and forth! 
>
>Well, it'll depend on the load on your machine and how long the
>running processes run for, on average.  Worst case is that you have
>N compute bound processes doing no I/O, which translates into 1 second
>time quantums.  Your signal catching programs only receive the signal
>when they run, and they have to wait till it's their turn.  This could be
>a long time, but it could be quick.  It's just not predictable.
>

	There is virtually no load - it is a sun ipc with only myself on
and running. I guess I'm amazed that the time quanta is as large as 1
second. 


>Maybe you could do something gross to speed it up, but already I think
>it's time for a walk into the hall of mirrors and have a good hard geek
>at what you're trying to achieve.  Signals were not designed for inter-process
>communication.  They were there to kill processes.
>
>

	I recieved a few suggestions to go to sleep rather than
while(1);, which suprisingly didn't work! What we had thought was that
by calling kill, we would force a context switch. That obvivously does
not happen. What *did* work was to call select without any file
descriptors, but with a miniscule timeout. That seemed to force the
switch. I'm still puzzled by sleep's failure.

	As it turns out, using sleep would be inappropriate for our
application since it may interfer with other timers, and this had to
be written as a library function.

	Actually, what we are trying to achieve is to have one process
notify another that it has to take care of some business. I don't
believe that there is any other way to do that in unix other than by
signals because of the constraints that 1) it will happen asynchronously,
2) it has to be recognized "quickly" and 3) it has to be done at a
library routine level so that the application program is totally
oblivious to what is going on. That leaves out message queues etc.

	Also, fewer than 1/4 of the signals available on my system have
anything to do with "killing" a process, despite the name of the system
call. It does serve as a one-bit interprocess communications pipe quite
nicely.

>Boyd Roberts			boyd@prl.dec.com
>
>``When the going gets wierd, the weird turn pro...''

	< dave

boyd@prl.dec.com (Boyd Roberts) (06/18/91)

In article <1991Jun17.131027.8700@bnlux1.bnl.gov>, drs@bnlux1.bnl.gov (David R. Stampf) writes:
> 
> 	There is virtually no load - it is a sun ipc with only myself on
> and running. I guess I'm amazed that the time quanta is as large as 1
> second. 

No load.  Ok.

> 
> 
> 	I recieved a few suggestions to go to sleep rather than
> while(1);, which suprisingly didn't work! What we had thought was that
> by calling kill, we would force a context switch. That obvivously does
> not happen. What *did* work was to call select without any file
> descriptors, but with a miniscule timeout. That seemed to force the
> switch. I'm still puzzled by sleep's failure.
> 

Run that by me again.  There was `no load' but you had a process going?

    while(1);

What we have here is a `compute bound process[es] doing no I/O'.

This will really screw up signal delivery.  Signals are delivered:

    - when a process is re-scheduled (ie. goes from runnable to actually running)

    - after a process takes an exception (CPU generated trap)

    - at the end of a system call

So, if all that's happening is the first case, then you could be waiting
a long time.

To, wait without doing anything use pause(2).

If you have select(2), why not get the process to write you a message?
Down a pipe, socket, pty, whatever.  Using signals is just _such_ a bad idea.

Whatever you do, don't have the select(2) polling every small time t.


Boyd Roberts			boyd@prl.dec.com

``When the going gets wierd, the weird turn pro...''

moss@cs.umass.edu (Eliot Moss) (06/18/91)

Why not uses pipes? Sending one character down a pipe (or pty or whatever) can
be the signal; the receiver can read and toss the character. You can use
select to detect the arrivial, and also arrange for SIGIO to be delivered if
you need rapid asynchronous handling of the notification.
--

		J. Eliot B. Moss, Assistant Professor
		Department of Computer and Information Science
		Lederle Graduate Research Center
		University of Massachusetts
		Amherst, MA  01003
		(413) 545-4206, 545-1249 (fax); Moss@cs.umass.edu

richard@locus.com (Richard M. Mathews) (06/20/91)

This discussion of optimizing context switches reminds me of a problem
encountered in IX/370.  IX/370 (little or no relation to AIX/370) was
an ancient attempt to put Unix on a 370 by having Unix running under
SSS, which in turn may be running on VM.  By the time something like
an interrupt could percolate through all these operating systems, WWIII
could come and go.

Some hacks were put into SSS and IX/370 to get them to work together to
improve performance.  For example, after a fork(), SSS apparently would
be told to immediately run the child.  After an exit(), SSS apparently
would be told to immediately run the parent.  This sped up a benchmark
in which a parent repeatedly does fork()/wait() and each child immediately
exits.  The problem was that nothing else would run.  Simultaneously start
up such a benchmark such that it runs forever and start up a compile of
a standard "hello world" program.  I actually left a machine in such a
state overnight, and the tiny compile never finished.

Moral: be careful about making broad assumptions about optimizations for
your scheduler.

Disclaimer: Opinions are my own, not LCC's or IBM's.  Facts are likely
to be distorted by a brain that long ago turned into tapioca pudding.

Richard M. Mathews			D efend
richard@locus.com			 E stonian-Latvian-Lithuanian
lcc!richard@seas.ucla.edu		  I ndependence
...!{uunet|ucla-se|turnkey}!lcc!richard