[comp.unix.aux] ELM, job control and signals

jim@jagubox.gsfc.nasa.gov (Jim Jagielski) (04/15/91)

Well, I've never really used ELM with job control before but I tried it out to
see if ELM would misbehave. When I did the ^Z ELM told me:

Stopped: Use "fg" to return to ELM.

ksh job control then responded with:

[1] + Stopped (signal)        elm

After about 5 minutes I brought elm back to the front... it said:

Back in ELM. (you might need to explicitly request a redraw)

All worked pretty good...

Anyway, I'm running ELM 2.3 PatchLevel 11 with NO real changes/additions to
the source (i.e., all I did was run Configure)...

Maybe:

	1. You're running an older version
	2. I didn't leave it in the back long enough
	3. My version was compiled under 2.0... maybe something happened
	   from 2.0 to 2.0.1 to change the way it compiles... I'll have to
	   try that.
--
===========================================================================
#include <std/disclaimer.h>
                                 =:^)
           Jim Jagielski                    NASA/GSFC, Code 711.1
     jim@jagubox.gsfc.nasa.gov               Greenbelt, MD 20771

 "I object to all this sex on the television. I mean, I keep falling off!"

alexis@panix.uucp (Alexis Rosen) (04/16/91)

In article <4886@dftsrv.gsfc.nasa.gov> jim@jagubox.gsfc.nasa.gov (Jim Jagielski) writes:
>Well, I've never really used ELM with job control before but I tried it out to
>see if ELM would misbehave. When I did the ^Z ELM told me:
>Stopped: Use "fg" to return to ELM.
>
>ksh job control then responded with:
>[1] + Stopped (signal)        elm
>
>After about 5 minutes I brought elm back to the front... it said:
>Back in ELM. (you might need to explicitly request a redraw)
>
>Maybe:
>
>	1. You're running an older version
>	2. I didn't leave it in the back long enough
>	3. My version was compiled under 2.0... maybe something happened
>	   from 2.0 to 2.0.1 to change the way it compiles... I'll have to
>	   try that.

And the winner is... door number 2.

In fact, _most_ of the time job control works fine. Only after relatively
long periods of time away from ELM does it die with "Alarm Clock".

Richard Todd wrote in an earlier article that set42sig() would do the trick.
I thought of that the first time I hit this problem, and I'll probably try
it soon out of sheer frustration, but I don't think it will work. This is
because, in my experience, programs that can't deal with job control die
immediately on return to foreground, always. For example, rn wouldn't tolerate
job control until I patched it. ELM, however, explicitly recognizes and traps
^Z so it can print its little message before suspending.

Based on the message it prints and the fact the it only dies after lengthy
suspensions, I'm guessing that perhaps it's setting up some sort of signal
so that it doesn't get too stale? If so then the bug is in how fast it's
expiring, not that it's expiring at all. Any ideas?

---
Alexis Rosen
Owner/Sysadmin, PANIX Public Access Unix, NY
{cmcl2,apple}!panix!alexis

syd@DSI.COM (Syd Weinstein) (04/16/91)

alexis@panix.uucp (Alexis Rosen) writes:
>>	2. I didn't leave it in the back long enough
>And the winner is... door number 2.
>Based on the message it prints and the fact the it only dies after lengthy
>suspensions, I'm guessing that perhaps it's setting up some sort of signal
>so that it doesn't get too stale? If so then the bug is in how fast it's
>expiring, not that it's expiring at all. Any ideas?
Elm uses a signal every timeout seconds to refresh the screen if new messages
came in.  A/UX apparently doesn't like it if a bg'd task takes an alarm
clock interrupt.  Thus when it wakes up it gets the error.  This is not
true of other UNIX OS's that support job control.  They just stack the
alarm wakeup until after resumption of the task.

Looks like a problem with A/UX to me.
-- 
=====================================================================
Sydney S. Weinstein, CDP, CCP                   Elm Coordinator
Datacomp Systems, Inc.                          Voice: (215) 947-9900
syd@DSI.COM or dsinc!syd                        FAX:   (215) 938-0235

coolidge@cs.uiuc.edu (John Coolidge) (04/17/91)

syd@DSI.COM (Syd Weinstein) writes:
>Elm uses a signal every timeout seconds to refresh the screen if new messages
>came in.  A/UX apparently doesn't like it if a bg'd task takes an alarm
>clock interrupt.  Thus when it wakes up it gets the error.  This is not
>true of other UNIX OS's that support job control.  They just stack the
>alarm wakeup until after resumption of the task.

>Looks like a problem with A/UX to me.

Here's what I think is going on: A/UX, by default, uses SYSV style
signals (signal handler reset to default every time a signal is called).
This means the first timeout signal works and the rest are punted. Note
that elm doesn't seem to do the right thing with this even in the
absence of job control; I've had elm on A/UX die with 'alarm clock' if I
just leave it running for a while. With job control, things just get
worse. Now even if you try to do the right SYSV thing and reenable the
signal handler, if you're TSTP'd the signal is enqueued and doesn't hit
the handler until things are CONT'd. By that point it seems to become
likely that the previously mentioned mishandling kicks in and everything
fails.

If, however, you tell A/UX to use BSD signal semantics by calling
set42sig() as the first thing in main() the problem seems to go away (it
did for me; I can leave elm in the background for a day and then resume
it with no ill effects). Somehow elm is becoming confused and not
handling SYSV signal semantics properly...

--John

--------------------------------------------------------------------------
John L. Coolidge     Internet:coolidge@cs.uiuc.edu   UUCP:uiucdcs!coolidge
Of course I don't speak for the U of I (or anyone else except myself)
Copyright 1991 John L. Coolidge. Copying allowed if (and only if) attributed.
You may redistribute this article if and only if your recipients may as well.