[comp.sys.sgi] Prioritizing jobs to be killed on gridlock; was: Re: Swap questions

shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) (12/03/90)

In article <1539@contex.UUCP> james@contex.UUCP (James McQueston) writes:
>In article <1990Nov28.163415.14317@odin.corp.sgi.com>, jmb@patton.wpd.sgi.com (Doctor Software) writes:
>> This seems to make the lesson be that you can run quite fine without
>> swap until you actually need to use it - but then watch out. There's no
>> garuntee of which process will actually be killed in this case, its 
>> just whomever has the page that can't go out.
>> -- Jim Barton >>    Silicon Graphics Computer Systems >>    jmb@sgi.com
>
>This is what we thought, as it is mentioned in the "bug fixes" section of
>the release notes for 3.3.  We tested it, and were upset that we could not
>determine which process got killed when swap runs out...
>...It is very important that the users be able to determine
>which process(es) get killed in order to solve the deadlock.
>
>Example: your server is used to run a simulation that takes hours or days to
>compute, and you have tuned the size of your finite-element mesh to just
>barely fit within the capabilities of that machine.  N hours later, someone
>else innocently runs some unimportant program on the server and causes page
>deadlock.  The O.S. blindly decides which process to kill and ... pow!  Chance
>determines that the simulation gets killed and you lose N hours of work.
>Too bad that the other user was just checking his mail.
>
>Suggestion: there should be some prioritization of the importance of processes
>when determining which one(s) should be killed to avoid deadlock.  Perhaps
>the processes priority or "nice" value could be used.  Anything is better
>than nothing.  Users MUST have some way of guiding the OS in this decision.

My 2 cents:  DON'T use "nice", for God's sake!  Big jobs that will run for
days or longer are going to have the lowest priorities, and will then
be killed first.  This is exactly the reverse of what you want!  And if
you start killing jobs with high priorities, you're going to kill first the
processes that keep the system running.

Second point:  programmers should write programs that run long jobs so that
they write out intermediate files from time to time (like every hour of CPU
time?) that can be used to re-initialize and re-start the program from that
point.  This insures against various crashes and mishaps.  This was standard
practice in the '70's on mainframes, when you paid for your time and storage,
and computer centers would typically limit reimbursement due to crashes to
some small fraction of the resources that a large job would use.  Since then, 
we've become spoiled by the reliability and power of small, local computers.

Third point:  for a quick fix, SGI should provide (and document, please! :-) )
a kernel configuration option that reinstates the old behavior of limiting 
active space to the swap space.  This would presumably be used only by sites 
that have large amounts of swap space, or which run many very long jobs.

Fourth point:  In the longer term, certainly some user control over kill
priorities is needed.  Suppose we had three levels:  (1) processes started
up at boot-time, that are necessary to keep the system alive.  (2) jobs
which the user specifies are to be exempt from being killed, except in
great emergency.  (3) "normal" jobs.  In case of gridlock, class (3) jobs
are killed until gridlock is relieved;  if all class (3) jobs are killed
and gridlock is still not relieved, then the system has no choice but to
start on class (2) jobs.  A syntax similar to the "nice" syntax could be
used to put a job into class (2):  "nokill jobname".  A post-hoc nokill
operator could also be defined:  "renokill PID#", in analogy to "renice."
This, like renice, could be restricted to superuser.  The site should
have control over who can invoke "nokill."  If only the superuser has, then
the following procedure might be typical.  User starts job, and requests
sysadm to renokill it.  Sysadm complies, or not, as he/she sees fit, 
depending on how many jobs are already in this category.  Or perhaps
a heuristic could be used which would allow the system to accept or
deny a "nokill" request.

Finally, another way to go would be to implement a batch queue, as Convex
has done.

	-P.
************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb**************************
Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY  10027
(212)854-1418  shenkin@cunixf.cc.columbia.edu(Internet)  shenkin@cunixf(Bitnet)
***"In scenic New York... where the third world is only a subway ride away."***

slamont@network.ucsd.edu (Steve Lamont) (12/03/90)

In article <1990Dec2.181344.20040@cunixf.cc.columbia.edu> shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) writes:
>Finally, another way to go would be to implement a batch queue, as Convex
>has done.
>
>	-P.

Small point of information.  The batch queue system (NQS) was developed, to the
best of my knowledge, at NASA Ames Research Center Numerical Aerodynamics
Simulation Facility for their Crays.  Convex has taken that system, which is
pretty nifty, by the way, and packaged it up for use on their systems.  NQS is
also available on Crays from Cray Research.  There is also a PD version, I
believe that you can get fropm either NASA or COSMIC -- I don't recall which.

							spl (the p stands for
							putting the record
							straight, I think)
-- 
Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu
NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943
What is truth and what is fable, where is Ruth and where is Mabel?
                       - Director/producer John Amiel, heard on NPR

mike@BRL.MIL (Mike Muuss) (12/05/90)

> From: Steve Lamont <network.ucsd.edu!slamont@ucsd.edu>
>In article <1990Dec2.181344.20040@cunixf.cc.columbia.edu> shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) writes:
>>Finally, another way to go would be to implement a batch queue, as Convex
>>has done.

Steve mentions NQS. It was inspired by BRL's MDQS, which also runs fine
on SGIs. (Batch jobs, printer spooling, etc).  You can get it via
anonymous ftp from host FTP.BRL.MIL, file "arch/mdqs.tar.Z".

	Best,
	 -Mike