shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) (12/03/90)
In article <1539@contex.UUCP> james@contex.UUCP (James McQueston) writes: >In article <1990Nov28.163415.14317@odin.corp.sgi.com>, jmb@patton.wpd.sgi.com (Doctor Software) writes: >> This seems to make the lesson be that you can run quite fine without >> swap until you actually need to use it - but then watch out. There's no >> garuntee of which process will actually be killed in this case, its >> just whomever has the page that can't go out. >> -- Jim Barton >> Silicon Graphics Computer Systems >> jmb@sgi.com > >This is what we thought, as it is mentioned in the "bug fixes" section of >the release notes for 3.3. We tested it, and were upset that we could not >determine which process got killed when swap runs out... >...It is very important that the users be able to determine >which process(es) get killed in order to solve the deadlock. > >Example: your server is used to run a simulation that takes hours or days to >compute, and you have tuned the size of your finite-element mesh to just >barely fit within the capabilities of that machine. N hours later, someone >else innocently runs some unimportant program on the server and causes page >deadlock. The O.S. blindly decides which process to kill and ... pow! Chance >determines that the simulation gets killed and you lose N hours of work. >Too bad that the other user was just checking his mail. > >Suggestion: there should be some prioritization of the importance of processes >when determining which one(s) should be killed to avoid deadlock. Perhaps >the processes priority or "nice" value could be used. Anything is better >than nothing. Users MUST have some way of guiding the OS in this decision. My 2 cents: DON'T use "nice", for God's sake! Big jobs that will run for days or longer are going to have the lowest priorities, and will then be killed first. This is exactly the reverse of what you want! And if you start killing jobs with high priorities, you're going to kill first the processes that keep the system running. Second point: programmers should write programs that run long jobs so that they write out intermediate files from time to time (like every hour of CPU time?) that can be used to re-initialize and re-start the program from that point. This insures against various crashes and mishaps. This was standard practice in the '70's on mainframes, when you paid for your time and storage, and computer centers would typically limit reimbursement due to crashes to some small fraction of the resources that a large job would use. Since then, we've become spoiled by the reliability and power of small, local computers. Third point: for a quick fix, SGI should provide (and document, please! :-) ) a kernel configuration option that reinstates the old behavior of limiting active space to the swap space. This would presumably be used only by sites that have large amounts of swap space, or which run many very long jobs. Fourth point: In the longer term, certainly some user control over kill priorities is needed. Suppose we had three levels: (1) processes started up at boot-time, that are necessary to keep the system alive. (2) jobs which the user specifies are to be exempt from being killed, except in great emergency. (3) "normal" jobs. In case of gridlock, class (3) jobs are killed until gridlock is relieved; if all class (3) jobs are killed and gridlock is still not relieved, then the system has no choice but to start on class (2) jobs. A syntax similar to the "nice" syntax could be used to put a job into class (2): "nokill jobname". A post-hoc nokill operator could also be defined: "renokill PID#", in analogy to "renice." This, like renice, could be restricted to superuser. The site should have control over who can invoke "nokill." If only the superuser has, then the following procedure might be typical. User starts job, and requests sysadm to renokill it. Sysadm complies, or not, as he/she sees fit, depending on how many jobs are already in this category. Or perhaps a heuristic could be used which would allow the system to accept or deny a "nokill" request. Finally, another way to go would be to implement a batch queue, as Convex has done. -P. ************************f*u*cn*rd*ths*u*cn*gt*a*gd*jb************************** Peter S. Shenkin, Department of Chemistry, Barnard College, New York, NY 10027 (212)854-1418 shenkin@cunixf.cc.columbia.edu(Internet) shenkin@cunixf(Bitnet) ***"In scenic New York... where the third world is only a subway ride away."***
slamont@network.ucsd.edu (Steve Lamont) (12/03/90)
In article <1990Dec2.181344.20040@cunixf.cc.columbia.edu> shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) writes: >Finally, another way to go would be to implement a batch queue, as Convex >has done. > > -P. Small point of information. The batch queue system (NQS) was developed, to the best of my knowledge, at NASA Ames Research Center Numerical Aerodynamics Simulation Facility for their Crays. Convex has taken that system, which is pretty nifty, by the way, and packaged it up for use on their systems. NQS is also available on Crays from Cray Research. There is also a PD version, I believe that you can get fropm either NASA or COSMIC -- I don't recall which. spl (the p stands for putting the record straight, I think) -- Steve Lamont, SciViGuy -- 1882p@cc.nps.navy.mil -- a guest on network.ucsd.edu NPS Confuser Center / Code 51 / Naval Postgraduate School / Monterey, CA 93943 What is truth and what is fable, where is Ruth and where is Mabel? - Director/producer John Amiel, heard on NPR
mike@BRL.MIL (Mike Muuss) (12/05/90)
> From: Steve Lamont <network.ucsd.edu!slamont@ucsd.edu> >In article <1990Dec2.181344.20040@cunixf.cc.columbia.edu> shenkin@cunixf.cc.columbia.edu (Peter S. Shenkin) writes: >>Finally, another way to go would be to implement a batch queue, as Convex >>has done. Steve mentions NQS. It was inspired by BRL's MDQS, which also runs fine on SGIs. (Batch jobs, printer spooling, etc). You can get it via anonymous ftp from host FTP.BRL.MIL, file "arch/mdqs.tar.Z". Best, -Mike