sheryl@seas.gwu.edu (Sheryl Coppenger) (05/29/91)
My original posting was: >Historically (since before I worked here), there was a policy for >users running large jobs which ran along these lines: > 1) Run them in the background > 2) Start them up with "nice" to lower priority > 3) Only one such job per machine >I have a user challenging that policy on the grounds that UNIX will >take care of it automatically. I am aware that some systems have >that capability built in to the kernel, but I am not sure to what >extent ours do or how efficient they are. I have looked >in the mauals for both of our systems (Sun and HP) and in the >Nemeth book, but they are pretty sketchy. > I went on to ask if other sites had such a policy and if anyone had information specific to the machines we used (SunOS 4.1/4.1.1, HP-UX 7.0). I waited until after the holiday weekend to summarize, in case other sites have short expires on their news articles. Response was mostly from adminstrators and overwhelmingly FOR the policy. Details varied depending upon the type of machine, the environment and other factors (including, perhaps, the temperament of the administrator). People who said the policy was unnecessary seemed to be, like the user challenging the policy here, quoting the general texts about what SHOULD happen in the UNIX operating system (the Maurice Bach book or the BSD Daemon Book). The administrators were more likely to quote O'Reilly & Associates' _System Performance Tuning_. I was read a statement over the phone along the lines of "Users will tell you that 'nice' doesn't have any effect -- don't believe them". Some of those responding either had or were writing software to automatically renice or start and stop processes. I have a copy of one package and will try to get others and experiment here. Some replies contained assumptions about the type of programs being run and pointed out that programs which were I/O bound or doing a lot of paging would not be affected by nicing. Something along those lines may be what's happening when ksh or finger processes run wild and take over the CPU. Unfortunately, I haven't had a chance to run experiments here. Blair Houghton was kind enough to do so and post the results here, but since they were for Ultrix I doubt I will get the same results on our systems. Some interesting statements taken out of context: "nohup" automatically nices jobs (True on SunOS but not HP, and part of the problem is that users are running from the shell and not backgrounding jobs) Kernels that do renice default to 4 which is insufficient. On SunOS, a "nice +4" will still allow large jobs to interfere with nfsd and inhibit file server function. (NFS interference was noticed here, and often we found large jobs on file servers because users called in to complain that they couldn't login to a workstation or got NFS "not responding" errors.) SunOS won't renice processes but HP-UX will. However, HP-UX will pop the priority up high again after a time. (I heard about the automatic renicing first in an HP context. HP-UX handles realtime priorities, and I think you have the option of loading daemons as realtime processes in order to improve NFS, etc. Large processes have been less of a problem on our HPs, but our graphics users notice a difference when they're trying to run animations on an HP9000/835 and jobs are running in the background. We also have a problem with runaway ksh processes and the kernel never seems to detect those and lower the priority enough to allow interactive users to get their work done). Ksh and csh do NOT change the priority of background jobs, but the Bourne shell will. (We run ksh mostly, occasionally csh or bash). Users should be required to use the "batch" command instead of the "nice" command, because "batch" lowers priority. (I can find no evidence in the man pages that batch does this. In the most recent version of the policy, we require the users to batch AND nice jobs. Batch schedules according to load according to the manual. It also lets users run multiple jobs serially). Many thanks to all who responded (and to those who probably will respond to this posting too). Below I include edited copies of the replies I received by mail. If anyone didn't see the follow-up postings, I will be glad to mail them a copy (I have 4, I think). =============================================================================== From nick@fwi.uva.nl Thu May 16 05:18:23 1991 Renice is only used by the scheduler to give each process a Priority. The evaluation of priority takes into account recent CPU usage, and is weighted heavily in favour of interactive processes that require CPU time in short bursts. Typically a 'renice 19' on big processes has little effect, since thay will be constantly paging, which is unaffected by nice values ( its in the bottom half of the kernel I believe ). A `renice -19` will however quite possibly stop your system, by giving other processes virtually no CPU activity. One of the best sources of info on this side of UN*X is the BSD Daemon book by Leffler, McCusik and Karels. IF you don't have access to the book I can find out the full name and ISBN number I you need it. The book is _Mega_, a sort of BSD bible. From onward@freefall Thu May 16 09:49:33 1991 There is no definite answer to the question you have. As much of it is a matter of etiquette as it is OS specific, plus, it depends very much on what the jobs do. However, here are some points to think about: 0. Unix does not take care of it automatically. It only tries. 1. nice only modifies the Scheduling priority, not the execution/cpu priority. (Internals). 2. processess lose priority if they are constantly runnable (ie. more or less cpu bound). When they get an I/O interrupt, their priorities jump up, so that they can complete their I/O call, but if they then hangs on the cpu, the priority drop real quick again. (Job Type). 3. multiple large jobs do drag down a machine. Statistically, this is NOT due to the cpu resource being exhausted, but due to the amount of paging involved with large processes. (Job Size) Suns do not do context switching too well when the number of runnables go past 8 (or was it 16 - there was much discussion about this in comp.arch 2 months ago) 3a. if the machine is diskless and 8 MB in real memory, even one large job is noticeable if someone is also working on the workstation. 4. Policy suggestion: on hp90000s300, s400, s3 and s4, don't worry too much if they are functioning as single user workstations and not multiuser servers. s800 machines were designed to be multiuser servers, except perhaps for the 815, so you may not want more than 3 or 4 large jobs running on it simultaneously. 5. Try to make a balance between: a) online user response time b) turnaround time for large jobs c) machine resources (machines with lots of real memory tends to run large processes much better) d) machine dedication e) do your users really need to run that many large jobs ? or are they just letting the computer do their thinking for them ? Remember the old days when resources were REALLY expensive, and people tried out their models by hand before putting them through the system. From kelly@remus.ee.byu.edu Thu May 16 10:43:23 1991 We have about 50 hp 9000/300 and we always ask users to start long jobs niced as much as they can (19). We have enough machines that one can usually be found with not much running. This does not hurt the owner because as you probably know if nothing else is going on they will still get all of the cycles even if they are niced. Interactive users pay the price if the job occupies a lot of memory as the process swaps in and out. We have solved this problem for most users with a program called real-nice. It monitors key strokes about every minute and will not swap a large job in unless the keyboard has been idle for more than a minute. I have had no problems convincing users to use these tools and the users pretty much police each other. Once in a while I get a complaint and so we have written a renice for hp-ux and that solves all of these complaints. From chip@pender.ee.upenn.edu Thu May 16 12:31:51 1991 I administrate a Sun 4/280. During my tenure it ran everything from 4.0 to 4.1. It is used for CPU intensive processes that last from seconds to weeks. It is also our mail and news server for the department, so we have to keep interactive performance up. After a year of hand nicing processes in various combinations, I have chosen the, and written a program to implement it. This policy was designed to meet the following criteria: 1) Interactive use should be not be significantly degraded by system load. 2) Since people frequently run CPU intensive processes in foreground, some other way must be used to distinguish interactive from non-interactive use. 3) Since people frequently will use screenlock rather than logout on their personal workstations, interactive processes may accumulate significant total CPU usage. 4) The implementation of this policy should not require users to do anything. 5) People running CPU-intensive jobs should each get an equal portion of the CPU. Specifically, someone running two processes should not get twice as much CPU as someone running one process. Here's the procedure I use now, with comments about what I'd like to improve. I am planning on rewriting this program over the summer so that I can use it on all of the machines that I administrate, and so that I can distribute it to other sysadmins. <some garbage was inserted here. I'm not sure how much was lost> specific values were determined empiricly. I have a program that runs every six minutes. It creates a list of all processes that have used more than 2.5 CPU minutes. In my environment this excludes two week old emacs sessions while catching most CPU intensive processes in the first 6 or 12 minutes. This list is then sorted by user, and each process is niced according to the total number of processes owned by that user. IE, each user's processes all run at the same nice value. Remember that we are ignoring all interactive processes, so they are not reniced. Nice values are assigned according to the following table: Number of jobs 1 2 3 4 5 6 7 "nice" value 5 9 12 14 15 16 17 In general, I have had great success with this system. The users prefer it to getting yelled at when they forget (or didn't know to) renice their jobs. The users who did renice their jobs like the fact that they don't have to bother, and no one else can "cheat". The interactive users like the fact that system performance is pretty stable. Here are the things I'd like to improve: 1) These values "encourage" people to run one job at a time. I two people are running one job each, and two people are running two jobs each, and another person is running three jobs, the last person's jobs are effectively stopped. A better scheme would be to renice the first job to 4 and all other jobs to some high nice value. When the first job finished, the next job would be reniced to 4, etc. I am concerned, though, about the possability of someone running a long, CPU intensive pipeline of commands. I haven't come up with a better way to handle this while still maintaining "fairness". When I notice "stopped" jobs of this sort, I send a form letter to the user explaining that while running multiple jobs is not forbidden, they would run much faster if done sequentially. This letter explains how to use "batch" to run jobs sequentially. Most of my users were not specifically choosing to run in parallel, but were simply trying to run all three job overnight. 2) I'd like to add a test for the size of the jobs, so that one user cannot use up the entire virtual memory of the machine. I am considering killing single jobs if they use more than 48 Meg, and multiple jobs if they total more than 32 Meg. The reasoning is that while generally I don't want people using more than 32 Meg, I understand that some jobs legitimately need more. But if you are running a huge job tghat requires over 32 Meg, you shouldn't be running other jobs at the same time. I realize that this information is somewhat disorganized. Please feel free to write me for further explanation or more information. From kaul@ee.eng.ohio-state.edu Thu May 16 12:48:16 1991 Most BSD derived systems will do that, but the reduction is insignificant. They will nice the long running background job to level 2 by default (SunOS is an example of this), but that still is high enough to seriously interfere with a multi-user system. What are other system administrators doing about this issue? We have a policy that varies with the type of machine. On our Sparc2s we allow more long running jobs, but ask people to keep the load below 8. In general, our policy is that anything that's going to take more than 1/2 hour should be niced to level 20, a user can run one job on a machine and no more than 2 longterm jobs on the network at once. Further, for most of our machines (SLCs, Sun3s) we require that no more than 2 longterm jobs be running, but we allow up to 8 on our Sparc2s. Penalties for violating policy include a warning the first time, a conference with the advisor and offender the second time and the death penalty the third time (with no appeal possible). If there are good reasons for the policy, I want to be able to justify it as well as enforce it. The reason is that we want people to be able to get work done. We have had a few grad students who submitted 8 jobs to one machine and brought everybody else's work to a halt. He didn't last long ;-) -rich ps. One thing you'll notice is that "nice" has no effect on I/O limited programs. That's a small trouble around here, though, since much of our work is numerically intensive. From bernhold@qtp.ufl.edu Thu May 16 12:52:19 1991 We use a similar policy around here, on our network of Sun 3/50s and 4/380 file servers, 4/490, FPS-500, and IBM RS/6000-530 compute servers. The number of jobs depends on the machine, and is subject to revision, since we are trying to find the best balance. On the 3/50s we don't care how many jobs -- they are all on desktops and allocated to individuals. On the file servers, we currently allow two at a time - one long-running and one less than 1 hr. On the comupte servers, the limits are somewhat higher, but we try to avoid having too many jobs at once so that there is ample virtual memory for the running jobs. The Suns, running SunOS 4.1.1, perform much better with the jobs niced. Otherwise they are competing against nfsds on basically an equal level, which impairs the file server function. Running them "nice" shifts the balance towards the file server capability -- basically the batch jobs run in the "holes". SunOS's scheduling algorithm doesn't seem to do this "automatically" -- at least not to the extent we want. The RS/6000 and FPS-500 are run as compute servers, so we're less concerned about niceness on them, though on the FPS, we are using different levels of niceness to give priority to the group that paid for the machine over those who get a free ride. From appmag!curly!pa@hub.ucsb.edu Thu May 16 14:04:00 1991 Don't know about HPUX's. Back at Carnegie Mellon, our bsd 4.[23] would renice anything to +4 that had accumulated more than 5-10 minutes of CPU time. And it wasn't enough. Empirically, `nice +8' (csh syntax) was better, i.e. it would preserve interactive response. The interactive users would get all the cycles they needed, and the background jobs would compete for the rest. That's for CPU cycles. Now if a memory hog came in, the machine could very well start to thrash or run out of paging space. In this case only, it would be important to limit the number of jobs. This was on a VAX 785. I repeat, the system's handling was inadequate. I had to periodically post instructions on the local bboard, because too many users didn't know how to lower their priority manually. On workstations AIX and DGUX (both SysV derivatives) I never noticed any attempt by the system to change priorities. If I want to keep my interactive response, I have to nice jobs to 12. If I had lots of naive users, I would probably write a little renicing daemon... From octela!octelb.octel.com!jfd@mips.com Thu May 16 14:25:28 1991 I run SunOS (mostly 4.0.3, but some 4.1.1) so can't speak for HP-UX. I don't believe that SunOS "automagically" prioritizes jobs for you (oh that it were true!). I have users fire up multiple large jobs that beat the heck out of the machine. Nice'ing these makes a world of difference, particularly for interactive response (my servers are CPU & NFS servers, and handle 10-20 logins). Multiple large jobs really kill performance, quickly putting the machine into thrashing mode (particularly compiles on the same spindle). Of course these are 3/480's, my 4/490 does a little better :) The kernel may handle prioritizing things like NFS service vs. local I/O, but not multiple user jobs. Unless you count multi-tasking, which means you can run multiple jobs, and they should get near equal time (depending on a lot of factors). But what you really want is to prioritize interactive vs. long term CPU jobs such that big compiles don't affect joesphine user's rn session:) But it all boils down to politics, and what are the local "policies", what managment can/will support and how creative the users get at submiting jobs. My experience is once you get management to agree to specific policies, stick to them, once you allow an exception you open the floodgates. But, it is a good idea to have the policy specify exceptions, and when/how they are allowed. So, at the end of the fiscal year, with deadlines looming, we can say "Yes, you can run multiple jobs, but it requires xxx permission". The neat trick is to get xxx to understand when to give permission. Are you running sps/ps/vmstat to look at what the system is doing? This might help "prove" the OS isn't scheduling intelligently. I also found "System Performance Tuning" (O'Reilly & Assoc.) useful. From jmattson@UCSD.EDU Thu May 16 14:35:47 1991 Here, we have about 40 Sun-3's and about 35 Sun-4's running 4.1.1, along with s few other oddballs (HP, Vaxen, etc.). We are responsible for faculty, staff, and graduate student machines in offices and labs. Faculty and staff are rarely a problem, but the graduate students are working with insufficient computing resources, and there have been several problems with people being inconsiderate of others. On the primary graduate student machine (a sun 4/370 w/32mb of memory), we don't allow long-running jobs at all. We run a daemon that enforces nice 19 on all jobs with over 5 minutes accumulated CPU time (with the exception of shells, editors, etc.). Furthermore, if the system performance gets really bad for interactive use, we will look for any long-running jobs that are in violation of the usage policy and ask their owners to kill them. We do provide one machine explicitly for long-running jobs (a sun 4/280 with 56mb of memory and LOTS of swap). The same daemon enforces nice 4 on all jobs with over 5 minutes accumulated CPU time here (same exceptions). We also have problems with a graduate student lab of 12 Sparcstations. People tend to leave long-running jobs on these machines which can really degrade interactive performance (especialy since these machines only have 16mb of memory). Here, the same daemon will STOP any job with over 5 minutes accumulated CPU time if there is an active (idle less than 5 minutes) console user, and if the job doesn't belong to the console user. Stopped jobs will be continued when the console user logs off or goes idle for more than 5 minutes. We have found that the biggest problem with long-running jobs is not their CPU usage, but their memory usage. On a machine like a Sparcstation with SCSI drives, paging is just too slow. Once the physical memory of these machines is exhausted, paging starts and performance drops by an absolutely incredible amount. (The machine can be idle 75% of the time waiting on disk pages, even with several jobs in the run queue.) This is the primary reason for not allowing long-running jobs when there are interactive users on the machines. SunOS 4.1.1 uses the same priority scheme that 4.2 BSD used. Jobs with the best priority are scheduled in round-robin fashion. Every second, priorities are recalculated, so that jobs which have not obtained much CPU "recently" will get better priorities. This ensures that no one starves. The nice value is used in the priority calculation, to reduce the demand that a particular job makes on the CPU. However, even a very nice job will get some CPU every now and then--even on a heavily loaded system. The problem is that if there are high demands on physical memory, the nice job will probably have lost all of its pages while waiting, and it will immediately page fault when it gets scheduled to run. With enough big jobs running in the background, your machine will start to thrash. Nothing in SunOS checks for this or attempts to do anything to alleviate it. From chs!danq@jetson.UUCP Thu May 16 15:14:29 1991 Well, experiment will probably quickly convince you that for instance running multiple troff jobs at once will be slower than running the same jobs sequentially. Unix will attempt to be fair about running several large jobs, in the sense that it will attempt to give them all equal parts of the cpu over relatively short periods of time (seconds). Because of the time spent context switching between jobs, and (if they're large enough) the swapping resulting from using more than available physical memory, several large jobs at once will take longer to run than the same jobs in sequential order. The scheduler may have some bias toward interactive jobs built in, but it is quite easy for large jobs on Sun's to make life miserable for the interactive user. Nicing these jobs will help. Running only one at a time will help. Running large jobs at odd hours (using at) will help. The scheduler does not have the smarts to do these things itself. I don't think any internal knowledge about Unix is necessary; the effects of running large jobs is immediately evident in slower response time. If you don't see slower response time, then it's probably not worth worrying about. If you do, experiment with renicing the job(s) in question. The "top" command (available from the sun archives at Rice) is helpful in telling what jobs are actually eating up the cpu. You might want to try running that. From chris@suntan.ncsl.nist.gov Thu May 16 15:28:52 1991 Someone already posted about specific systems that automatically renice a cpu-bound process. Most don't, however. It's a good solution for those who don't follow the policies you've outlined. I would omit the third policy though. If a process is niced, I haven't seen any significant performance degradation if there are one or five of them. That is, processes sitting in the ready-to-run queue (but not running due to low priority) have little effect on system performance on SunOS systems I've worked with. You should perform the same experiment on your system. Yes, the load average WILL go up (all that tells you are the # of processes *ready* to run, not actually running), but interactive response should be more than adequate. However, I've taken this problem and cut it off at the head. All our users run "tcsh" which executes /etc/Login if it exists. All workstations have this file; my personal Sun 386i workstation running Sunos 4.0.2 is called "suntan": # tcsh file exec-ed by all users before ~/.cshrc # if ( $HOST == suntan && $USER != chris && $USER != root ) then /etc/renice +20 $$ >& /dev/null echo "System response may seem a bit sluggish..." endif # Stan is a special case. On *all* systems he get's niced. # if ( $USER == stan ) /etc/renice +15 $$ >& /dev/null A little confusing, but basically if it's not me (Chris) or root logging into my system, their login shell gets reniced severely and all their subprocesses inherit the login shell's nice level. Other machines they don't get reniced at all since I don't use them. :-) This has the unfortunate side-effect that although their cpu-intensive processes don't interfere with me, all their processes run at the same priority. Thus, if they launch something into the background, a current editing session will run at the same (low) priority. Now that I think about it, I should nice them in their login shell to 15 so they can nice their background jobs to 20 should they desire. Surprisingly, this setup works REALLY well! Most users don't even notice the subtle login message when they get reniced. You might want to run your shell's executable through "strings" to see if it executes any files prior to the users home .login/.cshrc/.profile. From revell@uunet.uu.net Thu May 16 18:46:03 1991 I think your user may have been talking about "nohup"'ed jobs. Nohup increments the priority by 5. I don't know of any systems that alter the priority just because the job is changed to the background. From @jhereg.osa.com:nightowl!det@tcnet.uucp Fri May 17 04:57:21 1991 What shell(s) are your user's using? Ksh and Csh do not change the priority of jobs in the background while Sh will automatically "nice" the job by four. Here is the results of the command "sleep 300 &" under the shells ksh, sh, and csh, respectively: F S UID PID PPID C PRI NI ADDR SZ WCHAN TTY TIME COMD 10 S 1001 13787 13784 0 39 20 4026b4 11 e0000000 ttyF01 0:00 sleep 10 S 1001 13792 1 0 39 24 4026b4 11 e0000000 ttyF01 0:00 sleep 10 S 1001 13814 13813 0 39 20 4026b4 11 e0000000 ttyF01 0:00 sleep ^^ From cantin@nrccsb3.di.nrc.ca Fri May 17 18:02:44 1991 I insist my users use "batch" instead of nice. I forbid them to use & because it penelizes too much interactive users. "batch" runs at a lower priority AND sends any generated output to the user via mail. You can also limit the number of "batch" jobs running on the system by modifying /usr/spool/cron/queuedefs. This way, users can send many jobs for execution, but if the maximum limit is reached, those jobs will be queued simply to be run when other complete. From mcorrigan@UCSD.EDU Fri May 17 23:32:34 1991 > > 1) Run them in the background > Yes, > 2) Start them up with "nice" to lower priority > Yes. > 3) Only one such job on a machine > Depends on how *big* since 2 mediums could make one big >I have a user challenging that policy on the grounds that UNIX >will take care of it automatically. I am aware that some systems Not so. Some UNIXes ( BSD ) will automatically renice a job to a nice of 4 after a certain number of minutes, but HP-UX does not do this. It is true that the UNIX scheduling algorithm lowers the priority of a job based on how time it has gotten recently but then the priority pops back up if it got low for a while. The algorithms are sophisticated with at least 2 regimes of scheduling, but all these are intended for interactive response to be maintained at an acceptable level , or that a fair share be given at all times to ALL jobs. When A job lasts for 50 hours, then it just doesn't make sense to allow it do get a fair share with the interactive users. If you lower the priority to as low as it can go ( nice == 19 ) then I find that the job may get no time for part of the day but whenever the system is idle the job gets right back in there for 100% of the cpu ( like from midnight to 9 am). For canned software packages I do the reniceing myself by wriintg a C program that is the one that is in the path that nices itself and then calls the real package with all the same args but now runs at low priority to start with. From bach!chuckp@ncr-mpd.ftcollinsco.NCR.COM Mon May 20 19:22:32 1991 see batch(1). It's part of SVr[34] and included on Suns. Don't know about the others. From cks@hawkwind.utcs.toronto.edu Tue May 21 22:58:08 1991 Blair Houghton has already posted some nice numbers on this subject (and some formulae). My local experience has been that a nice value of between 10 and 15 will keep the interactive users from feeling the extra load, even if the niced jobs are thrashing around on the disk a fair bit. I've run parallel kernel builds at +10 to +19 on not too studly Vaxen and only had people monitoring the load notice (load averages of around 20+). However, even one or two processes grinding away at nice 4 (the default 'renice' value on the few kernels that do this to processes) will be easily noticed by the users. You might want to see if you can get some sort of job batching system; there are a number of nice ones floating around. The better ones do things like stop the running job(s) once the load average climbs to high, or stop the running job(s) when N people are logged on, and so on. Better yet, you get the source, so you can put in custom hacks if necessary to adapt them to local convetions. -- Sheryl Coppenger SEAS Computing Facility Staff sheryl@seas.gwu.edu The George Washington University (202) 994-6853