[comp.unix.questions] Killing the correct process

marwood@ncs.dnd.ca (Gordon Marwood) (02/03/90)

The following is a code fragment from get21.sh, which is part of the
autoftp software available from simtel20:



# Create two sub-shell processes, one for FTP, one for time-out.
# Each sub shell has two parts. If the first part is succeeded, a KILL
# action will be taken to abort either the TIME-OUT or the FTP process.

  { ftp "$RemoteHost" <ftp.get.$$ &&
    { ps -x > tmp0$$; grep sleep tmp0$$ > tmp$$; 
      pd=`$_pid < tmp$$`; kill -9 $pd ; 
    }
  } & {
    sleep $ALARM &&
    { ps -x > tmp0$$; grep ftp tmp0$$ > tmp$$; 
      pd=`$_pid < tmp$$`; kill -9 $pd ; 
    }
  }


The problem with this is that "grep sleep" does not necessarily select the
"sleep" which belongs to the autoftp process, if there are other "sleep"s
running at the same time (from some other unrelated process).
Similarly, "grep ftp" is ambiguous if other "ftp"s are running
concurrently.  At the moment I have put in a workaround which looks for
the "sleep" with the correct delay time, and an ftp using the numeric
address rather than the hostname.  However, this is not a very elegant
way of doing things and not 100% foolproof.  I would appreciate any
assistance that anyone can offer to do these "kill"s more specifically.

Gordon Marwood
marwood@ncs.dnd.ca

maart@cs.vu.nl (Maarten Litmaath) (02/06/90)

In article <22332@adm.BRL.MIL>,
	marwood@ncs.dnd.ca (Gordon Marwood) wants to timeout ftp.

How about using the following general purpose script?  If that's out of the
question, the script might still give you a hint how to solve your problem.
----------8<----------8<----------8<----------8<----------8<----------
#!/bin/sh
# @(#)timeout 4.1 90/01/10 Maarten Litmaath

prog=`basename $0`
verbose=0

case $1 in
-v)
	verbose=1
	shift
esac

expr $# \< 2 \| 0"$1" : '.*[^0-9]' > /dev/null && {
	echo "Usage: $prog [-v] <timeout in seconds> <command>" >&2
	exit 2
}

timeout=$1
shift

exec 3>&1 4>&2 2> /dev/null

pid=`
	sh -c '
		(sleep '$timeout' > /dev/null & echo $!; exec >&-;
		wait; kill -9 $$) & exec "$@" >&3 3>&- 2>&4 4>&-
	' $prog "$@"
`

kill -9 $pid && exit 0
test $verbose = 1 && echo TIMEOUT >&4
exit 1
--
  The meek get the earth, Henry the moon, the rest of us have other plans.  |
  Maarten Litmaath @ VU Amsterdam:  maart@cs.vu.nl,  uunet!mcsun!botter!maart

gwc@root.co.uk (Geoff Clare) (02/09/90)

In article <5312@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <22332@adm.BRL.MIL>,
>	marwood@ncs.dnd.ca (Gordon Marwood) wants to timeout ftp.
>
>How about using the following general purpose script?  If that's out of the
>question, the script might still give you a hint how to solve your problem.

[ extremely complicated script deleted ]

There is a much simpler way than Maarten's.

The basic strategy is:

	(sleep $time; kill $$) & exec "$@"

Here is my "timeout" command, using this method with added frills.
An important feature which Maarten's script lacks, is that mine kills
the process with SIGTERM, allowing it to clean up.  It only goes for
SIGKILL if the SIGTERM doesn't do the job.  Maarten dives straight
for SIGKILL, which could leave a mess.

---------------------- cut here ----------------------
:
# execute a command with timeout
# Geoff Clare <gwc@root.co.uk>, Feb 1990

USAGE="usage: $0 [-v] [seconds] command args ..."

SIGKILL=9	# may be system dependant

time=10		# default 10 seconds
verbose=n
for i in 1 2
do
	case $1 in
	-v)
		verbose=y
		shift
		;;
	[0-9]*)
		time=$1
		shift
		;;
	""|-*)
		echo >&2 "$USAGE"
		exit 1
		;;
	esac
done

pid=$$
(
	sleep "$time"
	# use SIGTERM first to allow process to clean up
	kill $pid >/dev/null 2>&1
	rc=$?
	sleep 2

	# if process hasn't died yet, use SIGKILL
	kill -$SIGKILL $pid >/dev/null 2>&1

	case "$rc$verbose" in
	0y) echo >&2 "
TIMED OUT \"$*\""
		;;
	esac
) &

exec "$@"
---------------------- cut here ----------------------

-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600

maart@cs.vu.nl (Maarten Litmaath) (02/10/90)

In article <1212@root44.co.uk>,
	gwc@root.co.uk (Geoff Clare) writes:
)In article <5312@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
)>In article <22332@adm.BRL.MIL>,
)>	marwood@ncs.dnd.ca (Gordon Marwood) wants to timeout ftp.
)>
)>How about using the following general purpose script?  If that's out of the
)>question, the script might still give you a hint how to solve your problem.
)
)[ extremely complicated script deleted ]
             ^^^^^^^^^^^
There were reasons for that, you know!

)There is a much simpler way than Maarten's.
)
)The basic strategy is:
)
)	(sleep $time; kill $$) & exec "$@"

Here's an example using your timeout command:

	$ timeout -v 0 cat
	Terminated
	$ 
	TIMED OUT "cat"

Sic!  The verbose message appears ASYNCHRONOUSLY!  That's not what I want!
To get a synchronous message I had to have a *child* execute the command
supplied, while the *parent* reports the status.  In fact your strategy is
followed between the backquotes in my script.
Another bug in your script is shown by the following example:

	$ timeout 333 date
	Sat Feb 10 02:56:10 MET 1990
	$ sps
	Ty User     Proc# Command
[stuff deleted]
	p1.maart     1561 sh
	p1. |        1562 sleep 333
[stuff deleted]
	$ 

You leave useless processes hanging around!

)Here is my "timeout" command, using this method with added frills.
)An important feature which Maarten's script lacks, is that mine kills
)the process with SIGTERM, allowing it to clean up.  It only goes for
)SIGKILL if the SIGTERM doesn't do the job.  [...]

How do you determine that the job has finished cleaning up?

Turning back to my script - the only problem with it: the given command
might have created children which keep hanging around after their parent
has died.  This problem could be solved if there were a variant of
`kill(2)' to destroy a complete `process tree'; such a tree is NOT the
same as a process *group*.
Killing the process group (kill -9 -$pid) doesn't work if your shell is
sh, since it doesn't put a `job' into its own process group; you would
destroy unrelated processes as well.
--
  The meek get the earth, Henry the moon, the rest of us have other plans.  |
  Maarten Litmaath @ VU Amsterdam:  maart@cs.vu.nl,  uunet!mcsun!botter!maart

rock%warp@Sun.COM (Bill Petro (SunOS Marketing)) (02/13/90)

I don't know if this is appropriate, but here is a handy little tool I
use:


------------------- Cut here, valuable coupon, collect 'em all --------
#!/bin/sh
#
# killjobs
#
# Do our best to locate the PIDs to kill from the give
# process names.
#

MYNAME=`basename $0`
Ask=yes
case $1 in
"")
	echo "Usage: $MYNAME process_name ..."
	exit 1 ;;
-n)
	Ask=no; shift ;;
esac

PSAX=`ps ax`
for pname
do
	line= PID=
	PID=`echo "$PSAX" |
		grep -w $pname |
		awk '!/grep|'$MYNAME'/ {print $1}'`

	for pid in $PID
	do
		if [ $Ask = yes ]
		then
			line=`echo "$PSAX" | awk '$1 == '$pid' {print}'`
			echo -n "Kill ($line)? "
			read reply
			case $reply in
			[Yy]*)	kill -9 $pid
				;;
			esac
		else
			kill -9 $pid
		fi
	done
done
------------ end ----------------------------------------------
     Bill Petro  {decwrl,hplabs,ucbvax}!sun!Eng!rock
"UNIX for the sake of the kingdom of heaven"  Matthew 19:12

gwc@root.co.uk (Geoff Clare) (02/14/90)

In article <5352@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <1212@root44.co.uk>,
>	gwc@root.co.uk (Geoff Clare) writes:
>
>)There is a much simpler way than Maarten's.
>)
>)The basic strategy is:
>)
>)	(sleep $time; kill $$) & exec "$@"
>
>Here's an example using your timeout command:
>
>	$ timeout -v 0 cat
>	Terminated
>	$ 
>	TIMED OUT "cat"
>
>Sic!  The verbose message appears ASYNCHRONOUSLY!  That's not what I want!

There is a big difference in the purpose of the '-v' option between our
scripts.  Yours provides the only message you get to say that the command
timed out.  With mine the shell has already informed me of that
(synchronously) with the "Terminated" message seen above.  The '-v' would
not normally be used in this simple case.  It is there to provide extra
information about *which* command was timed out, in the event of several
parallel "timeout" commands.

Also, the delay is a direct result of allowing the process time to
clean up.  Using a straight 'kill -9' as your script did would mean
the verbose message would appear before the prompt.  The very slight
delay in providing the additional information is a small price to pay
for allowing the timed out process to tidy up.

>To get a synchronous message I had to have a *child* execute the command
>supplied, while the *parent* reports the status.

But this has a very serious drawback - you lose the exit status of the
executed command.  The command could die horribly with no error message,
and you would not know about it!  All your script tells you is whether
the command completed within the timeout period or not.

>Another bug in your script is shown by the following example:
>
>[stuff deleted]
>
>You leave useless processes hanging around!

Leaving a harmless sleep command behind will not usually cause any
problems.  It will get zapped when the user logs out, if it hasn't
completed by then.  There is no way round this if you want to have
the parent process exec the command.  This slight drawback is greatly
outweighed by the advantage of getting the exit status passed back
correctly.

>)An important feature which Maarten's script lacks, is that mine kills
>)the process with SIGTERM, allowing it to clean up.  It only goes for
>)SIGKILL if the SIGTERM doesn't do the job.  [...]
>
>How do you determine that the job has finished cleaning up?

The script allows 2 seconds.  If this might not be enough it could be
passed as an option to "timeout".  Most commands only need to remove a
few temporary files and maybe kill some children, which doesn't take
very long.  A chance to clean up in a short time is much better than no
chance at all.

>Turning back to my script - the only problem with it: the given command
>might have created children which keep hanging around after their parent
>has died.

A well designed command will kill its children as part of the clean up
procedure when it receives a SIGTERM.  That is why it's important to use
SIGTERM first rather than going straight for SIGKILL.
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600

bph@buengc.BU.EDU (Blair P. Houghton) (02/16/90)

In article <1212@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>In article <5312@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>>In article <22332@adm.BRL.MIL> marwood@ncs.dnd.ca (Gordon Marwood)
>>>[wants to timeout ftp.]
>>
>>How about using the following general purpose script?  If that's out of the
>>question, the script might still give you a hint how to solve your problem.
>
>[ extremely complicated script deleted ]
>
>There is a much simpler way than Maarten's.

Edited for brevity...

Called with "foo (time in seconds) (command and arguments)"

>#! /bin/sh
>pid=$$
>(
>	sleep "$time"
>	# use SIGTERM first to allow process to clean up
>	kill $pid >/dev/null 2>&1

At this point, and I'll admit it's a rare possibility, but
not an impossibility, especially on multiprocessor machines,
what happens if the $command (see below) has already exited
and some other process (possibly on another processor) has 
begun with the same pid?  Or won't that happen while this
backgrounded process is tying up the process-group number?

>	sleep 2
>	# if process hasn't died yet, use SIGKILL
>	kill -$SIGKILL $pid >/dev/null 2>&1
>) &
>
>exec "$command"

I'm trying to implement exactly this control structure in C, now,
and one of my least favorite problems is that of getting
the status of a process I may not own without having to do

	system("ps -l##### > /tmp/foo"); /* ##### is the pid */

(or fork-and-exec, etc...  Actually, the system() call
wouldn't be much of a diseconomy, because it only has to
execute the ps(1) once or twice after what may be several
hours of sleep()'ing.)

I mean, just how the heck does ps(1) do it?  Everything I've seen
implies it goes through /dev/mem one byte at a time.

				--Blair
				  "Oh, lovely..."

maart@cs.vu.nl (Maarten Litmaath) (02/17/90)

In article <1221@root44.co.uk>,
	gwc@root.co.uk (Geoff Clare) writes:
)...  The '-v' would
)not normally be used in this simple case.  It is there to provide extra
)information about *which* command was timed out, in the event of several
)parallel "timeout" commands.

I still want the verbose info to appear synchronously (see below).

)Also, the delay is a direct result of allowing the process time to
)clean up.  Using a straight 'kill -9' as your script did would mean
)the verbose message would appear before the prompt.  The very slight
)delay in providing the additional information is a small price to pay
)for allowing the timed out process to tidy up.

In your script the delay is always 2 seconds; in the latest version of my
script it's a parameter.

)>To get a synchronous message I had to have a *child* execute the command
)>supplied, while the *parent* reports the status.
)
)But this has a very serious drawback - you lose the exit status of the
)executed command.  The command could die horribly with no error message,
)and you would not know about it!  All your script tells you is whether
)the command completed within the timeout period or not.

Right; fixed in the current version (easy).

)>Another bug in your script is shown by the following example:
)>
)>[stuff deleted]
)>
)>You leave useless processes hanging around!
)
)Leaving a harmless sleep command behind will not usually cause any
)problems.  It will get zapped when the user logs out, if it hasn't
)completed by then.  [...]

What if the command wasn't started from a terminal?  Not nice.
Unnecessary too.
Another plus of timeout 5.0: the signal is now a parameter too.
Now it's your turn again, Geoff! :-)
--------------------cut here--------------------
#!/bin/sh
# @(#)timeout 5.0 90/02/17 Maarten Litmaath

prog=`basename $0`
verbose=0
SIG=-KILL
sigopt=0
sleep=:
timeout=10
usage="Usage: $prog [-v] [-signal] [timeout] [+delay] [--] <command>"

while :
do
	case $1 in
	--)
		shift
		break
		;;
	-v)
		verbose=1
		;;
	-*)
		SIG=$1
		sigopt=1
		;;
	+*)
		EXPR='..\(.*\)'
		delay=`expr x"$1" : "$EXPR"`
		sleep="kill -0 \$\$ && sleep $delay && kill -KILL \$\$"
		case $sigopt in
		0)
			SIG=-TERM
		esac
		;;
	[0-9]*)
		timeout=$1
		;;
	*)
		break
	esac
	shift
done

case $# in
0)
	echo "$usage" >&2
	exit 2
esac

exec 3>&1

pid=`
	sh -c '
		(sleep '$timeout' > /dev/null & echo $!; exec >&-; wait;
		kill '$SIG' $$ && '"$sleep"') 2> /dev/null & exec "$@" >&3 3>&-
	' $prog "$@"
`
status=$?

kill -9 $pid 2> /dev/null || {
	test $verbose = 1 && echo "TIMEOUT: $*" >&2
}
exit $status
--------------------cut here--------------------
--
  "Ever since the discovery of domain addresses in the French cave paintings
  ..."  (Richard Sexton)        |  maart@cs.vu.nl,  uunet!mcsun!botter!maart

maart@cs.vu.nl (Maarten Litmaath) (02/20/90)

In article <5382@buengc.BU.EDU>,
	bph@buengc.BU.EDU (Blair P. Houghton) writes:
)...
)>	kill $pid >/dev/null 2>&1
)
)At this point, and I'll admit it's a rare possibility, but
)not an impossibility, especially on multiprocessor machines,
)what happens if the $command (see below) has already exited
)and some other process (possibly on another processor) has 
)begun with the same pid?  [...]

Possible if your `$command' has been running for a *long* time and/or new
processes have come and gone like crazy in the meantime...  Normally it
takes a few days for the pid to wrap around; conventionally MAXPID is
30,000.  This value may have to be raised.

)... one of my least favorite problems is that of getting
)the status of a process I may not own without having to do
)
)	system("ps -l##### > /tmp/foo"); /* ##### is the pid */
)...

How about `fp = popen("ps -l#####", "r")'?
To check if a process is alive use `kill(pid, 0)'.
--
  "Ever since the discovery of domain addresses in the French cave paintings
  [...]"  (Richard Sexton)      |  maart@cs.vu.nl,  uunet!mcsun!botter!maart

bph@buengc.BU.EDU (Blair P. Houghton) (02/21/90)

In article <5615@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <5382@buengc.BU.EDU> bph@buengc.BU.EDU (Blair P. Houghton) writes:
>>what happens if the command has already exited
>>and some other process (possibly on another processor) has 
>>begun with the same pid?
>
>Possible if your `$command' has been running for a *long* time and/or new
>processes have come and gone like crazy in the meantime...  Normally it
>takes a few days for the pid to wrap around;

I'm talking about timing-out someone's login, possibly 12-20 hours
after the killer-process has been started.  Plenty of time.

>conventionally MAXPID is 30,000.  This value may have to be raised.

It's suspiciously low.  I can live with it, though, as cave-sysadmins
have had to do since the dawn of Unix.

I've got another idea, though, which is rather specific to this
problem.  Basically, I'm writing Yet Another Access Scheduler,
implementing it as a shell-wrapper to be placed as a user's
default-shell in /etc/passwd.  There's a dichotomy where the
decision is to be made whether to fork-and-exec the process-killer
(which then sleep()'s until it's time to kick the user off) and
exec the shell, or to fork-and-exec the shell and exec the killer.

So, if I fork the shell, it becomes the killer's child, and the killer
will get SIGCLD if the shell is exited.  If I do it the other way,
I run the risk of zapping some other guy.

I'll have to balance that fact against the three reasons I've
switched this flow-of-control three times already, but it's
somewhat stronger than the others, which involved convenience of I/O...

(BTW, the killer has to run suid-root, so the user can't kill it first,
so respecting interrupts is out.)

>>	system("ps -l##### > /tmp/foo"); /* ##### is the pid */

>How about `fp = popen("ps -l#####", "r")'?

Alas, popen() ain't ANSI, but who am I kidding?  Good idea.

				--Blair
				  "I wonder if Stephen King has
				   this sort of plotting trouble
				   with his killers..."

maart@cs.vu.nl (Maarten Litmaath) (02/22/90)

In article <5405@buengc.BU.EDU>,
	bph@buengc.BU.EDU (Blair P. Houghton) writes:
)...
)I'm talking about timing-out someone's login, possibly 12-20 hours
)after the killer-process has been started.  Plenty of time.

I think you need a new system call like `killtree()' or `killsession()' to
solve your problem completely:

-	you cannot kill all his processes, as you must leave background
	processes and processes on other terminals alone
-	process groups aren't the answer either

Therefore there's a time window between determining which processes have to
be killed, and actually killing them.  During this interval the user may have
created new processes.
--
  "Ever since the discovery of domain addresses in the French cave paintings
  [...]"  (Richard Sexton)      |  maart@cs.vu.nl,  uunet!mcsun!botter!maart

bph@buengc.BU.EDU (Blair P. Houghton) (02/22/90)

In article <5636@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <5405@buengc.BU.EDU>,
>	bph@buengc.BU.EDU (Blair P. Houghton) writes:
>)...
>)I'm talking about timing-out someone's login, possibly 12-20 hours
>)after the killer-process has been started.  Plenty of time.
>
>I think you need a new system call like `killtree()' or `killsession()' to
>solve your problem completely:

Sending a kill -HUP to a login-shell in the console window
of a uVAX workstation Running Ultrix clobbers all of a
login's processes and cycles the X server.

I'm going to have to implement something to duplicate the
essential portions of the globalized .logout procedure
(we've got some buggy CAD sw that refuses to die unless
you've closed it carefully or kill -KILL'ed it.  Actually,
it's not so much buggy as it's poorly designed by some
less-than-proficient people at a UCalifornian school
that's rather well known for its software but must be eating
dirt due to these guys but that shall remain nameless...)

>-	you cannot kill all his processes, as you must leave background
>	processes and processes on other terminals alone

In _this_ case, I can justifiably kill all his processes,
and remote logins and backgrounded jobs are verboten as
well.  The scheduling is designed for lab purposes, and
excluding someone from the scheduling restriction is the
default.  Anyone with the right to run as a normal user
will get it.  It's only the great masses of students who
only need one piece of CAD software and a little email who
get scheduled.  The scheduler was designed to control lab
access, rather than to massage terminal usage.  Use of the
CPU is the important thing.

I may yet go looking for all processes with the user's pid
in them and "kill -f" them (I'll have to write an implementation
of the -f flag, though...:)

>-	process groups aren't the answer either

And it seems from the documentation that they would be... :-(

>Therefore there's a time window between determining which processes have to
>be killed, and actually killing them.  During this interval the user may have
>created new processes.

It's a matter of getting the right one and making sure it's
the same user.  Right now I'm checking to see that the pid
is running, that it's owned by the correct uid, on the same
tty, and has the same name (e.g., "-sh", "-csh", "-ksh",
etc.) as when the killer-program was started (i.e., at
login time.)

It's still a crap-shoot, but it's a crap-shoot with
probability of failure that has an *upper* bound of
1/30,000.  That is one in 30,000 times when the same
person has logged out and logged back in on the same
workstation on the same day.  Such a situation is not
too common in a big lab with tons of students and lots
of workstations.

So I'm reduced from perfection to sufficiency and a
balancing of the relative costs.  I can handle one in
30,000 users' getting angry.  I can't handle the eleven
professorial harangues per day when students can't get
into the lab on time...

				--Blair
				  "If you can't do it right, at
				   least be successful at it."

maart@cs.vu.nl (Maarten Litmaath) (02/23/90)

In article <5410@buengc.BU.EDU>,
	bph@buengc.BU.EDU (Blair P. Houghton) writes:
)...
)Sending a kill -HUP to a login-shell in the console window
)of a uVAX workstation Running Ultrix clobbers all of a
)login's processes and cycles the X server.

Ever heard of `signal(SIGHUP, SIG_IGN)'?

)...
)>Therefore there's a time window between determining which processes have to
)>be killed, and actually killing them.  During this interval the user may have
)>created new processes.
)
)It's a matter of getting the right one and making sure it's
)the same user.  Right now I'm checking to see that the pid
)is running, that it's owned by the correct uid, on the same
)tty, and has the same name (e.g., "-sh", "-csh", "-ksh",
)etc.) as when the killer-program was started (i.e., at
)login time.)

It seems one can easily bypass your little scheme:

	execl("/bin/sh", "Will Blair find me?", (char *) 0);

Furthermore: race conditions!

)It's still a crap-shoot, but it's a crap-shoot with
)probability of failure that has an *upper* bound of
)1/30,000.  That is one in 30,000 times when the same
)person has logged out and logged back in on the same
)workstation on the same day.  Such a situation is not
)too common in a big lab with tons of students and lots
)of workstations.  [...]

I was talking of events with much higher probability.
--
  "Ever since the discovery of domain addresses in the French cave paintings
  [...]"  (Richard Sexton)      |  maart@cs.vu.nl,  uunet!mcsun!botter!maart

gwc@root.co.uk (Geoff Clare) (02/23/90)

In article <5448@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:

>I still want the verbose info to appear synchronously (see below).

I'll let you into a secret.  The original version never had a "verbose"
message at all - I never felt the need for it.  I added it because
I was posting my script as an alternative to yours, but using a better
strategy, so I thought it ought to have the same facilities.

As I said before, normally the verbose message isn't needed - the
shell will report the termination of the process.

However, if you really don't like the delay in the message, I offer
the following change which causes it to be printed straight away
(although still after the prompt) in cases where the command dies
immediately on the first kill.

Change:
	sleep 2

	# if process hasn't died yet, use SIGKILL
	kill -$SIGKILL $pid >/dev/null 2>&1
to:
	if kill -0 $pid >/dev/null 2>&1
	then
		sleep 2

		# if process hasn't died yet, use SIGKILL
		kill -$SIGKILL $pid >/dev/null 2>&1
	fi

>In your script the delay is always 2 seconds; in the latest version of my
>script it's a parameter.

This isn't very useful - how do you predict how long the command will
need to clean up?  Two seconds is plenty for most commands.

>)>To get a synchronous message I had to have a *child* execute the command
>)>supplied, while the *parent* reports the status.
>)
>)But this has a very serious drawback - you lose the exit status of the
>)executed command.  The command could die horribly with no error message,
>)and you would not know about it!  All your script tells you is whether
>)the command completed within the timeout period or not.
>
>Right; fixed in the current version (easy).

So now you can tell when something went wrong, but you still aren't
getting the full picture.  If the command is terminated by a signal
your script will instead exit with a non-zero exit code (usually
128 + signal number).

Another big advantage in having the parent execute the command is that
it is then a normal foreground process, so you can use the INTR and QUIT
keys in the normal way.  If the user interrupts your script, he gets
a prompt back and may think he has killed the command, whereas in fact
it's still running in the background.

Face it Maarten, having the child execute the command is a total no-hoper.

>)Leaving a harmless sleep command behind will not usually cause any
>)problems.  It will get zapped when the user logs out, if it hasn't
>)completed by then.  [...]
>
>What if the command wasn't started from a terminal?  Not nice.
>Unnecessary too.

The script was designed for casual use from a terminal.  If I ever
wanted to put it to more serious use, I could get rid of the leftover
sleep by adding another background process to monitor the other two.
Or better still, I would use a C program.

>Another plus of timeout 5.0: the signal is now a parameter too.

Another unnecessary frill.  SIGTERM is the right signal to use - that's
why it's the default for the "kill" command.

>Now it's your turn again, Geoff! :-)

I'm not going to post my script again, because I think we've wasted enough
net.bandwidth on this already.  If anybody has saved a copy, they can
apply the change I suggested above if they think it's worth it.  (I don't
think it is - I doubt if I will ever use the '-v' option).
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600

maart@cs.vu.nl (Maarten Litmaath) (02/24/90)

In article <1381@root44.co.uk>,
	gwc@root.co.uk (Geoff Clare) writes:
)...
)This isn't very useful - how do you predict how long the command will
)need to clean up?  Two seconds is plenty for most commands.

For *most* commands (according to you, anyway) - why not let the user
specify the delay?  It's no trouble at all and leads to higher generality.
You shouldn't say too quickly: "The user doesn't need it."
That's the approach which leads to things like:

	$ set a b c d e f g h i j
	$ echo $10	# echo parameter 10
	a0		# oops!

"The user doesn't need more than 9 arguments."

)...
)So now you can tell when something went wrong, but you still aren't
)getting the full picture.  If the command is terminated by a signal
)your script will instead exit with a non-zero exit code (usually
)128 + signal number).

You're right again!  I've posted another script to alt.sources, which does
things your way (at last! :-), but having a few extras too.

)Another big advantage in having the parent execute the command is that
)it is then a normal foreground process, so you can use the INTR and QUIT
)keys in the normal way.  If the user interrupts your script, he gets
)a prompt back and may think he has killed the command, whereas in fact
)it's still running in the background.

Only the sleep (and its wait()ing parent) keep running, *just* like in
your approach!

)...
)The script was designed for casual use from a terminal.

Yours, not mine.

)If I ever
)wanted to put it to more serious use, I could get rid of the leftover
)sleep by adding another background process to monitor the other two.
)Or better still, I would use a C program.

One thing that's clear from our discussion: sh isn't powerful enough! :-(

)>Another plus of timeout 5.0: the signal is now a parameter too.
)
)Another unnecessary frill.  SIGTERM is the right signal to use - that's
)why it's the default for the "kill" command.

Again I don't agree; first there's the generality, then there's the fact
that SIGHUP is used to signal exceptions too, and lastly both SIGALRM and
SIGXCPU seem normal to send on a *timeout*.
--
  "Belfast: a sentimental journey to the Dark Ages - Crusades & Witchburning
  - Europe's Lebanon - Book Now!" | maart@cs.vu.nl,  uunet!mcsun!botter!maart

bph@buengc.BU.EDU (Blair P. Houghton) (02/25/90)

In article <5650@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>In article <5410@buengc.BU.EDU>,
>	bph@buengc.BU.EDU (Blair P. Houghton) writes:
>)...
>)Sending a kill -HUP to a login-shell in the console window
>)of a uVAX workstation Running Ultrix clobbers all of a
>)login's processes and cycles the X server.
>
>Ever heard of `signal(SIGHUP, SIG_IGN)'?

I use HUP because it gives the honest processes a chance to
use `signal(SIGHUP, somethinguseful)'.

If you want, you can try this:

	stream_type fp;
	char line[WAYBIG];

	/* Already got user's uid via whatever means. */
	fp = popen("/bin/ps axl","r");
	while ( fgets(line,sizeof(line),fp) != (char *)NULL )
	    if ( uid == parse_for_uid(line) )
		kill( parse_for_pid(line), SIGKILL );
	pclose(fp);

SIG_IGN that!

(BTW, most of the time in that loop is spent waiting for
the `ps axl' to get aroung to its first line of output,
so it's more economical than one would think, at first.)

Of course, this is overkill when the only people I plan to
force the schedule on are almost uniformly without any
knowledge of unix, much less esoterica like execl's.  I'll
know well anyone who knows what a SIG_IGN is, and I'll
trust them not to bust my stuff, or I won't let them log in
at all.

It's obvious that unless you're trying to implement it as a
convenience rather than a security measure you're going to
have to rewrite the kernel, or add a killsession() call,
or getcherself a Real Operating System (hah!)...

>Furthermore: race conditions!

Which never elaborate themselves...

>)It's still a crap-shoot, but it's a crap-shoot with
>)probability of failure that has an *upper* bound of
>)1/30,000.  That is one in 30,000 times when the same
>)person has logged out and logged back in on the same
>)workstation on the same day.  Such a situation is not
>)too common in a big lab with tons of students and lots
>)of workstations.  [...]
>
>I was talking of events with much higher probability.

Wot?  You can assign a PID to your own process?  I'd like
to see that...

				--Blair
				  "Blair -v"

maart@cs.vu.nl (Maarten Litmaath) (03/01/90)

In article <5414@buengc.BU.EDU>,
	bph@buengc.BU.EDU (Blair P. Houghton) writes:
)...
)	fp = popen("/bin/ps axl","r");
)	while ( fgets(line,sizeof(line),fp) != (char *)NULL )
)	    if ( uid == parse_for_uid(line) )
)		kill( parse_for_pid(line), SIGKILL );
)	pclose(fp);
)
)SIG_IGN that!

Between the `ps' and the kill() the user might have created new processes.
Race conditions, babe.

)...
)>I was talking of events with much higher probability.
)
)Wot?  You can assign a PID to your own process?  I'd like
)to see that...

See above.  Talking about assigning your own pid: a friend of mine (Peter
Valkenburg, valke@psy.vu.nl) once wrote a program called `snatch_pid';
you fed it the pid you wanted to grab, then waited an hour or so (ca. 30,000
fork()s on a VAX).  The use: if some game initializes its pseudo random
generator with `srand(getpid())'...

)				--Blair
)				  "Blair -v"

				  "Maarten -SEGV 666 :667 +1 /vmunix"
--
  "Belfast: a sentimental journey to the Dark Ages - Crusades & Witchburning
  - Europe's Lebanon - Book Now!" | maart@cs.vu.nl,  uunet!mcsun!botter!maart

gwc@root.co.uk (Geoff Clare) (03/01/90)

In article <5669@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>
>You're right again!  I've posted another script to alt.sources, which does
>things your way (at last! :-), but having a few extras too.

Glad to hear you've seen the light (at last :-).

Your new script is the same as mine with one worthwhile addition and a
few rather less useful (IMHO) ones.  Thanks for saving me the effort of
implementing my suggested method for tidying up the leftover sleep.

>)>Another plus of timeout 5.0: the signal is now a parameter too.
>)
>)Another unnecessary frill.  SIGTERM is the right signal to use - that's
>)why it's the default for the "kill" command.
>
>Again I don't agree; first there's the generality, then there's the fact
>that SIGHUP is used to signal exceptions too, and lastly both SIGALRM and
>SIGXCPU seem normal to send on a *timeout*.

Sorry, all three signals you mention are not right for this purpose.

SIGHUP:  you might want to do a "nohup timeout somecommand ... &"

SIGALRM: is not for "timing out" a process, it's for use by a process, e.g.
	 for timing out a system call or for sleeping.  If the process is
	 using SIGALRM, all your "time out" will do is wake it up early.

SIGXCPU: is for limiting resource usage, and in any case is non-standard.

The phrase "time out" when applied to a process really means "terminate
before normal completion".  When you want to *TERM*inate a process you use
SIG*TERM*.  Need I say more?
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600

maart@cs.vu.nl (Maarten Litmaath) (03/02/90)

In article <1813@root44.co.uk>,
	gwc@root.co.uk (Geoff Clare) writes:
)...
)Glad to hear you've seen the light (at last :-).

My original script explored the (portable/V7) sh boundaries.
I wanted to distinguish between someone *else* killing the job and *timeout*
killing it; this was the purpose of the `-v' option (I know, there's a race
condition); as `timeout -v' already told me the job had timed out, I didn't
want the shell's `Killed' message; therefore I diddled with file descriptor
2; as I wanted a synchronous message I had to invoke an `sh -c' to do the
real work; to kill a leftover sleep I had to use the backquote construct.

)Your new script is the same as mine with one worthwhile addition and a
)few rather less useful (IMHO) ones.  Thanks for saving me the effort of
)implementing my suggested method for tidying up the leftover sleep.

There's still a better way, something like:

	for t in $timeout $delay
	do
		while test $t -gt $interval
		do
			sleep $interval
			kill -0 $$ || exit
			t=`expr $t - $interval`
		done
		sleep $t
		kill $SIG $$ && kill -0 $$ || exit
		SIG=-KILL
	done

)...
)SIGHUP:  you might want to do a "nohup timeout somecommand ... &"
              ^^^^^
	      Indeed.

)SIGALRM: is not for "timing out" a process, it's for use by a process, e.g.
)	 for timing out a system call or for sleeping.  If the process is
)	 using SIGALRM, all your "time out" will do is wake it up early.

In general you're right; however, is it inconceivable that the process has
been especially configured to cleanup on reception of a SIGALRM?

)SIGXCPU: is for limiting resource usage, and in any case is non-standard.

So what?  From `man init' on SunOS 4.0.3c:

	init catches the hangup signal (SIGHUP) and interprets it to
	mean  that  the  file /etc/ttytab should be read again.

"Boo hiss!  SIGHUP is for signaling a hangup on a terminal line!"

)The phrase "time out" when applied to a process really means "terminate
)before normal completion".  When you want to *TERM*inate a process you use
)SIG*TERM*.  Need I say more?

	"No.  You need to say less."
	-- Richard Sexton, richard@gryphon.COM

Couldn't resist! :-)
The phrase "time out" when applied to a process really means "kill
before normal completion".  When you want to *KILL* a process you use
SIG*KILL*.  Sic!
--
  "Belfast: a sentimental journey to the Dark Ages - Crusades & Witchburning
  - Europe's Lebanon - Book Now!" | maart@cs.vu.nl,  uunet!mcsun!botter!maart

bph@buengc.BU.EDU (Blair P. Houghton) (03/03/90)

In article <5724@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
>So what?  From `man init' on SunOS 4.0.3c:
>
>	init catches the hangup signal (SIGHUP) and interprets it to
>	mean  that  the  file /etc/ttytab should be read again.
>
>"Boo hiss!  SIGHUP is for signaling a hangup on a terminal line!"

Yeah.  It's crufty all right, but when was the last time
init made a phone call?

				--Blair
				  "Holee-- ALASKA!?"

alex@impch.imp.com (Alex Hanselmann) (03/12/90)

In article <5669@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:

>You shouldn't say too quickly: "The user doesn't need it."
>That's the approach which leads to things like:
>
>	$ set a b c d e f g h i j
>	$ echo $10	# echo parameter 10
>	a0		# oops!
>
>"The user doesn't need more than 9 arguments."

if you have a ksh or csh you can type:

$ echo ${10}

Alex
+-------------------------------------------------------------------------+
| Alex Hanselmann, Laengistr 15, 4133 Pratteln,   EMAIL: alex@imp.com     |
| ( UNIX && C ) makes it possible - ImproWare +41-61-82171-19 / 44 (FAX)  |
+-------------------------------------------------------------------------+

gwc@root.co.uk (Geoff Clare) (03/12/90)

In article <5724@star.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
|
|)SIGXCPU: is for limiting resource usage, and in any case is non-standard.
|
|So what?  From `man init' on SunOS 4.0.3c:
|
|	init catches the hangup signal (SIGHUP) and interprets it to
|	mean  that  the  file /etc/ttytab should be read again.
|
|"Boo hiss!  SIGHUP is for signaling a hangup on a terminal line!"

Of course a program can choose to use any signal for it's own purposes.
But that's not really relevant to the point under discussion, which was
what signal should be used for terminating processes in general.  The
correct signal for this job is SIGTERM, because any well designed program
will clean up and exit ASAP when it receives a SIGTERM.

|)The phrase "time out" when applied to a process really means "terminate
|)before normal completion".  When you want to *TERM*inate a process you use
|)SIG*TERM*.  Need I say more?
|
|The phrase "time out" when applied to a process really means "kill
|before normal completion".  When you want to *KILL* a process you use
|SIG*KILL*.  Sic!

Looks like we're going round in circles here.  This was one of my
original objections to your old method.  SIGKILL should only be used as
a last resort.  Going straight for SIGKILL does not allow the process
to clean up.

    "... we came in?  <Pink Floyd - The Wall>  Isn't this where ..."

I think we're probably talking to ourselves here, Maarten.  Everyone
else put this subject in their KILL file ages ago.
-- 
Geoff Clare, UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk  (Dumb mailers: ...!uunet!root.co.uk!gwc)  Tel: +44-1-315-6600
                                         (from 6th May 1990): +44-71-315-6600