[comp.unix.internals] suspension of long process

wgsiemel@praxis.cs.ruu.nl (Willem Siemelink) (10/03/90)

I have got a process that takes days to complete.  However, the System
Administration does not want me to run it in daytime.  So now I am looking for
a way to stop a process and later continue it.  We have HP UX 7.0 running on
the workstations here.
I can do this by hand by typing ^Z on the running process followed by 'bg' and
'fg' but that is only when I'm on the keyboard at the very moment.  Obviously
that isn't good enough.  I've had a suggestion using 'kill' but I couldn't
figure it out.  ('kill -3 <pid> gives a core-dump but I can't get it started
again.)

Any (clear) suggestion would be much appreciated.  If responses are posted
here or if you mail me at <wgsiemel@praxis.cs.ruu.nl> I'll be able to do
something with them.  (I mean to say that I am not going to track responses in
different groups).  If I happened to break some local curtesy I'm sorry, I
didn't mean to.

Have a day, Willem.
--
The good thing about death is that it is preceded by life.   <me>

gt0178a@prism.gatech.EDU (Jim Burns) (10/04/90)

in article <3940@ruuinf.cs.ruu.nl>, wgsiemel@praxis.cs.ruu.nl (Willem Siemelink) says:

> I have got a process that takes days to complete.  However, the System
> Administration does not want me to run it in daytime.  So now I am looking for

How about running it in a script that does something like this:

user-process&
kill -STOP $!                          #suspend last background job
echo kill -CONT $!|at 1900             #resume job at 7pm
echo kill -STOP $!|at 900              #suspend next morning
                                       #repeat variations of above 2 lines
				       #for as many days as estimated need
echo $!>$HOME/.batch-job               #record pid in case still running
				       #after above at's exhausted

Really long jobs could be handled similarly w/cron(1). Make sure you read
the 'man at' pages for admin files that must be setup to allow you to run
at jobs, and other gotchas.
-- 
BURNS,JIM
Georgia Institute of Technology, Box 30178, Atlanta Georgia, 30332
uucp:	  ...!{decvax,hplabs,ncar,purdue,rutgers}!gatech!prism!gt0178a
Internet: gt0178a@prism.gatech.edu

dan@kfw.COM (Dan Mick) (10/04/90)

In article <3940@ruuinf.cs.ruu.nl> wgsiemel@praxis.cs.ruu.nl (Willem Siemelink) writes:
>I can do this by hand by typing ^Z on the running process followed by 'bg' and
>'fg' but that is only when I'm on the keyboard at the very moment.  Obviously
>that isn't good enough.  I've had a suggestion using 'kill' but I couldn't
>figure it out.  ('kill -3 <pid> gives a core-dump but I can't get it started
>again.)

The fact that you can ^Z and bg/fg means you must be on a BSD-derived 
system with job control; you can send the same signals to the process with

kill -TSTP <pid>  

to stop the process, and 

kill -CONT <pid>

to resume it.

wayne@teemc.UUCP (Michael R. Wayne) (10/04/90)

In article <3940@ruuinf.cs.ruu.nl> wgsiemel@praxis.cs.ruu.nl (Willem Siemelink) writes:
>I have got a process that takes days to complete.  However, the System
>Administration does not want me to run it in daytime.  So now I am looking for
>a way to stop a process and later continue it.  We have HP UX 7.0 running on
>the workstations here.
>I can do this by hand by typing ^Z on the running process followed by 'bg' and
>'fg' but that is only when I'm on the keyboard at the very moment.  Obviously
>that isn't good enough.  I've had a suggestion using 'kill' but I couldn't
>figure it out.  ('kill -3 <pid> gives a core-dump but I can't get it started
>again.)

	Follows a shar file which contains a program doing exactly what
you want to do without relying on BSD job control at all.  A couple of
commands to cron will allow you to start and stop your process on
command.  You might wish to remove the printf's once you understand what
is going on (or you might not).  Commands can alternately be placed into
your .login and .logout to run jobs only when you are not logged in.

/\/\ \/\/


#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  signal_tst.c
# Wrapped by wayne@teemc.tmc.mi.org on Thu Oct  4 10:25:48 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f signal_tst.c -a "${1}" != "-c" ; then 
  echo shar: Will not over-write existing file \"signal_tst.c\"
else
echo shar: Extracting \"signal_tst.c\" \(1525 characters\)
sed "s/^X//" >signal_tst.c <<'END_OF_signal_tst.c'
X#include <stdio.h>
X#include <sys/signal.h>
X#include <time.h>
X
Xint resume(), suspend();
Xlong t;
X
Xchar *ctime();
Xlong time();
X
X/*
X * Simple program to demonstrate use of user signals to suspend
X * and resume a job which tends to run for a long time.
X *
X * Note that this really isn't a true "checkpoint" as the program
X * status is not preserved, execution is merely interrupted.
X *
X * To use:
X *	compile w/ cc
X *	run the executable in the background
X *	% kill -USR2 will suspend execution
X *	% kill -USR1 will resume execution
X */
Xmain()
X{
X	/*
X	 * Grab the signals
X	 */
X	if (((int) signal(SIGUSR1, resume) == -1) ||
X		((int) signal(SIGUSR2, suspend) == -1) ||
X		((int) signal(SIGHUP, SIG_IGN) == -1)) {
X		printf("can't catch signals.\n");
X		exit(-1);
X		}
X	/*
X	 * Normally we want processes that run for a long time to be
X	 * very, very nice.
X	 */
X	if (nice(19) == -1) {
X		time(&t);
X		printf("$.24s - can't be nice.  continuing...\n", ctime(&t));
X		}
X	(void) time(&t);
X	printf("%.24s - STARTED\n", ctime(&t));
X	/*
X	 * Your code that runs a long time goes here
X	 */
X	for(;;)
X		;	/* Loop forever for example purposes	*/
X}
X
Xresume(i)
Xint i;
X{
X	if ((int) signal(SIGUSR1, resume) == -1) {
X		printf("can't recatch signal USR1 .\n");
X		exit(-1);
X		}
X	(void) time(&t);
X	printf("%.24s - RESUMED.\n", ctime(&t));
X	return(0);
X}
X
Xsuspend(i)
Xint i;
X{
X	if ((int) signal(SIGUSR2, suspend) == -1) {
X		printf("can't recatch signal USR2.\n");
X		exit(-1);
X		}
X	(void) time(&t);
X	printf("%.24s - SUSPENDED.\n", ctime(&t));
X	pause();
X	return(0);
X}
END_OF_signal_tst.c
if test 1525 -ne `wc -c <signal_tst.c`; then
    echo shar: \"signal_tst.c\" unpacked with wrong size!
fi
# end of overwriting check
fi
echo shar: End of shell archive.
exit 0
-- 
Michael R. Wayne      ---     TMC & Associates      --- wayne@teemc.tmc.mi.org
         Operator of the only 240 Horsepower UNIX machine in Michigan 

decot@hpisod2.HP.COM (Dave Decot) (10/05/90)

The signal SIGSTOP can be sent to the process (or process group) from
some other process at any time in the future, and the process (group)
will be suspended.  To get it going again later, it can send the
SIGCONT signal to the process (group).

You can have the job itself save the process (group) ID by having it
open some file and write its process ID there (or process group ID, from
calling getpgid()), and picking it up later.

From a shell script or cron job (see crontab(1)), this would be:

    kill -24 `cat /tmp/foopid`		# stop the foo job
    kill -26 `cat /tmp/foopid`		# resume the foo job

Dave Decot
(This is not necessarily HP's opinion and no warranty is expressed or
implied.)

decot@hpisod2.HP.COM (Dave Decot) (10/05/90)

You and your "System Administration" should also find out about nice(1),
since this is probably much more appropriate to your situation than stopping
and starting your process.  There should be no problem with running your
process during the day if it is set to run at a lower priority.

To use nice(1) to run a program "prog" that takes arguments "arg1 arg2 ...",
do:

   nice -5 prog arg1 arg2 ...

This runs prog at priority likely to be lower than most other processes.
It will only run when there is nothing more "important" to do.

Dave

tmh@prosun.first.gmd.de (Thomas Hoberg) (10/13/90)

I have thought a bit about that problem, too, some time ago. I never
implemented a solution, but I had the following ideas:

Programs like TeX, GNU-Emacs and some SCHEME interpreter I know use mechnisms 
to either dump themselves (after having loaded some libraries) in a form that
can be restarted later, or some tool, to turn a core dump back into an
executable program. For the first variant, your program should install a
signal handler for some user defined signal to dump it, for the second, send
a SIG_QUIT to the programm to get a core dump, then manually 'undump' it with
that utility you hopefully found in the TeX or Emacs distribution, to get an
executable that can be restarted. I guess this is a bit messy, and I would
like to see a facility to force processes to the disk on the run in UNIX, 
because it might generally make automatic powerfail restarts possible.

  ;-) Tom

PS. Tell me, if you got it to work!

----
Thomas M. Hoberg                                   tmh@prosun.first.gmd.de
GMD Berlin, Hardenbergplatz 2, 1000 Berlin 12, Germany   +49-30-254 99-160

tmh@prosun.first.gmd.de (Thomas Hoberg) (10/13/90)

Oops, rereading the original posting, I found, that you merely wanted to
stop during the daytime to reduce the load. Workable solutions have been
presented so I won't add any. Still a facility of the kind I described would
be nice, in order to be able to shut down the system (maintenance or other),
and restart it without getting hate mail from users, whose batch monosters
got trashed in the process...
----
Thomas M. Hoberg                                   tmh@prosun.first.gmd.de
GMD Berlin, Hardenbergplatz 2, 1000 Berlin 12, Germany   +49-30-254 99-160

jkimble@bally.Bally.COM (The Programmer Guy) (10/16/90)

If your flavor of UNIX allows you to vary the "nice" values plus/minus,
why not just bump the priority way, way, way down during the day and
then bring it back up to normal during the evening.

This would allow the process to keep chugging along in the day time
without getting too much in the way, and it might be able to grab
some "free" time during non-active hours (like lunch time).

This seems a lot less messy than starting/stopping;  is there any
badness associated with dynamic changes to the NICE values that
I've overlooked?

-- 
--Jim Kimble,						jkimble@bally.bally.com
Consulting for Bally Systems				  uunet!bally!jkimble

"ALPO is 99 cents a can.  That's over SEVEN dog dollars!!"