[comp.unix.internals] "Nice" or not to "nice" large jobs

sheryl@seas.gwu.edu (Sheryl Coppenger) (05/16/91)

Historically (since before I worked here), there was a policy for
users running large jobs which ran along these lines:

	1)  Run them in the background

	2)  Start them up with "nice" to lower priority

	3)  Only one such job on a machine

I have a user challenging that policy on the grounds that UNIX
will take care of it automatically.  I am aware that some systems
have that capability built in to the kernel, but I am not sure
to what extent ours do or how efficient they are.  I have looked
in the manuals for both of our systems (Sun and HP) and in the Nemeth 
book, but they are pretty sketchy.  In my previous job, I was doing
support for realtime systems written on HP 9000/800-series and I
am fairly sure about what happens in realtime but that doesn't
help in this case.  

What are other system administrators doing about this issue?  Can
any internals experts point me to something definitive about my
particular OSs?  We have

	HP 9000s - 300, 400 and 800 series, running HPUX 7.0
	SUN 3s running 4.1 and SUN 4s running 4.1.1

I'm looking for the specific, not the general.

If there are good reasons for the policy, I want to be able to
justify it as well as enforce it.

Thanks in advance

--

  Sheryl L. Coppenger        
  sheryl@seas.gwu.edu           
  (202) 994-6853

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (05/16/91)

In article <3197@sparko.gwu.edu> sheryl@seas.gwu.edu (Sheryl Coppenger) writes:
>
>Historically (since before I worked here), there was a policy for
>users running large jobs which ran along these lines:
>	1)  Run them in the background
>	2)  Start them up with "nice" to lower priority
>	3)  Only one such job on a machine
>I have a user challenging that policy on the grounds that UNIX
>will take care of it automatically.

Our policy for nice/renice is:

0  - "short" processes/jobs (compilations, or up to 5 minutes cpu time).
1  - "medium" processes/jobs (up to 1 hour cpu time).
2  - "long" processes/jobs (all other jobs).

The main intention is to get "background" number crunching jobs out of
the way of foreground/interactive users. While it is true that cpu-bound
processes will get less cpu than an I/O bound process at the same
"niceness", we want the separation to be bigger, and to allow short jobs
to hog the cpu relative to medium/long jobs, and allow medium jobs to
hog the cpu relative to long jobs (which may run for days). We use only
niceness 0, 1 and 2 because those are the only ones that are really
different on our central system (Apollo Domain/OS) - we would use a wider
spread of nice values (e.g. 5, 10, 20) if they worked properly. We allow
more than one job of each type to be run, though we would like to get a
batch queueing system to work to control the number/sequencing of some
of the long jobs (some of them step all over each other in terms of disk
I/O so the system is much more productive if only one of them is running
at a time).

Mike.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775

phil@eecs.nwu.edu (William LeFebvre) (05/16/91)

In article <3197@sparko.gwu.edu>, sheryl@seas.gwu.edu (Sheryl Coppenger) writes:

|> I have a user challenging that policy on the grounds that UNIX
|> will take care of it automatically.

Tell the user s/he is wrong.

The VAX BSD system (4.1 and 4.2, not sure about 4.3) would automatically
renice any process that exceeded 10 CPU minutes to a nice of 10.  Note
that the default for the "nice" command is 4, so it was in a user's best
interests to use the "nice" command explicitly.

SunOS does NOT do this.  There are times when I wish it did.

Furthermore, if there is a CPU intensive or paging intensive or
(I think even) an i/o intensive process running, interactive performance
is *noticeably* degraded.  Get about 2 or 3 of them running and you
have a system that crawls for the interactive users.  But, renice all
three of those hog processes to even 4 and interactive use is once 
again snappy.

So even on the VAX you had to put up with about 20 minutes of "hell"
before the system would renice the offending process.

Your policy is a good one.  Stay with it.  Users who are going to 
run long-term jobs should take the responsibility on themselves to
insure that the process does not interfere with smooth interactive
activity.

		William LeFebvre
		Computing Facilities Manager and Analyst
		Department of Electrical Engineering and Computer Science
		Northwestern University
		<phil@eecs.nwu.edu>

rc@chaucer.bellcore.com (25815-Richard Casto(I895)p130) (05/16/91)

In article <3197@sparko.gwu.edu>, sheryl@seas.gwu.edu (Sheryl Coppenger)
writes:
> 
> Historically (since before I worked here), there was a policy for
> users running large jobs which ran along these lines:
> 
> 	1)  Run them in the background
> 
> 	2)  Start them up with "nice" to lower priority

I am not sure if this is quite what you are looking for, but ksh
has a user-settable option to automatically "nice" background jobs
(set -o bgnice).

jfh@rpp386.cactus.org (John F Haugh II) (05/17/91)

In article <1991May16.145927.9815@casbah.acns.nwu.edu> phil@eecs.nwu.edu (William LeFebvre) writes:
>Tell the user s/he is wrong.
>
>The VAX BSD system (4.1 and 4.2, not sure about 4.3) would automatically
>renice any process that exceeded 10 CPU minutes to a nice of 10.  Note
>that the default for the "nice" command is 4, so it was in a user's best
>interests to use the "nice" command explicitly.

I checked the source for this some time back and noticed that it has a
bug which doesn't cause processes with non-standard nice values to be
reniced.  If you start off with a niceness of 1, after 10 minutes you
stay a 1.  If you start off with a niceness of 0, then you get reniced.

It would make better sense if the code set your nice to 10 after 10
minutes if it was anything less than 10 (but not less than 0, that is.)
-- 
John F. Haugh II        | Distribution to  | UUCP: ...!cs.utexas.edu!rpp386!jfh
Ma Bell: (512) 255-8251 | GEnie PROHIBITED :-) |  Domain: jfh@rpp386.cactus.org
"If liberals interpreted the 2nd Amendment the same way they interpret the
 rest of the Constitution, gun ownership would be mandatory."

bhoughto@hopi.intel.com (Blair P. Houghton) (05/18/91)

In article <1991May16.140622.29266@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>Our policy for nice/renice is:
>
>0  - "short" processes/jobs (compilations, or up to 5 minutes cpu time).
>1  - "medium" processes/jobs (up to 1 hour cpu time).
>2  - "long" processes/jobs (all other jobs).

That's all but affectless.

(The numbered equations below are from Chapter 5 of _The
Design and Implementation of the 4.3BSD UNIX(R) Operating
System_, by S. J. Leffler, et al, Addison-Wesley, 1989; there
are different schemes for other systems, but the principle
is usually verisimilar).

The p_nice (set by nice(1) or renice(8)) of a process
affects the priority (in 4.3BSD) as follows:

	p_usrpri = PUSER + (p_cpu/4) + 2*p_nice               (Eq. 5.1)

Since p_usrpri is usually on the order of 50-80 for
non-sleeping processes, you're directly twiddling only 0-4
units, or 0-8% of the priority.  The effects of I/O,
paging and swapping will easily swamp that.

p_nice has the secondary effect, however, of retarding the
decay of p_cpu over time:  p_cpu is incremented at each
clock-tick (1/60th or 1/100th of a second, depending on
the implementation) and decremented once each second
according to

	p_cpu = p_cpu * ( (2*load) / (2*load + 1) ) + p_nice  (Eq. 5.2)

where load is the average length of the run queue over
the past minute.

In the case of two competing, cpu-hogging processes, load
remains at approximately 2.0, meaning that (5.2)
becomes approximately

	p_cpu = 0.8 * p_cpu + p_nice

In order to have p_nice cancel just half of the decay, it
should be larger than 0.1 * p_cpu.  Since PUSER is often
50, and p_nice is often 0, then a p_usrpri of 50-80 gives a
p_cpu of 0-120, which for maximum coverage means you want a
p_nice of 12 or more; anything lower is not going to affect
scheduling more than half the time.  Most importantly,
since 0-120 is practically the entire range of 0-127 of
which p_cpu is capable, you're going to see the effects of
differences in p_nice of +-1 only when running jobs for a
very, very long time.  Certainly a process that does _any_
iterative IO is going to obviate the nice-value of a
competing process that does no IO until the end.

You've got a good scheme, but instead of { 0, 1, 2 } you 
should probably use { 4, 8, 12 }, and leave 0 for processes
with insignificant run-times (shell input, e.g.).

Much more effective, however, is to have all long jobs run
at nice value 0 when there are no users on the system and
let them thrash it out in the middle of the night.

				--Blair
				  "Your mileage may vary."

bhoughto@hopi.intel.com (Blair P. Houghton) (05/19/91)

In article <4281@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>In article <1991May16.140622.29266@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>>Our policy for nice/renice is:
>>
>>0  - "short" processes/jobs (compilations, or up to 5 minutes cpu time).
>>1  - "medium" processes/jobs (up to 1 hour cpu time).
>>2  - "long" processes/jobs (all other jobs).
>
>That's all but affectless.

I flame myself.

Computer usage is a chaotic system, and as such deserves
a more empirical method.

Check out these stats, collected over 10 minutes of
real-time run of 3 simultaneous, identical processes (3
fork/exec's of the same executable, which was essentially a
tight loop testing and incrementing a couple of integers)
on a relatively quiescent machine (a VAX420 (a DECstation3100)
running Ultrix with a load of about 0.08):

    nice    cpu user-mode time (out of 600 sec)

    0       212.39 sec      35.4 %
    1       199.07          33.2
    2       185.34          30.9

    total   596.80          99.47 %

thus, each step in nice value gives about a 2.3
percentage-point split in cpu-time share.

When the processes are perturbed by another process that
simulates a user interface (extraneous load about 1.0 with
occasional i/o, intermittent computation, and light shell
activity):

    nice    cpu user-mode time (out of 600 sec)

    0       204.51 sec      34.1 %
    1       190.29          31.7
    2       179.59          29.9

    total   574.39 sec      95.7 %

this shows little change in the splits.

If we multiply the nice values by 4:

    nice    cpu user-mode time (out of 600 sec)

    0       285.40 sec      47.6 %
    4       181.63          30.3
    8       129.87          21.7

    total   596.90 sec      99.4 %

this simple increase causes the splits to increase
to approximately 17 percentage-points each.

Using the larger-grained and offset nice values I suggested
for the cpu-intensive processes:

no interference:

    nice    cpu user-mode time (out of 600 sec)

    4       306.26 sec      51.0 %
    8       169.23          28.2
    12      121.31          20.2

    total   596.80 sec      99.5 %

random, light i/o in another process:

    nice    cpu user-mode time (out of 600 sec)

    4       307.65 sec      51.3 %
    8       158.50          26.4
    12      107.60          17.9

    total   573.75 sec      95.6 %

The effect on the lower-priority processes is much
more noticeable, to the point that using nice becomes
actual management of the time the processes use.

As a finale, I present 10 processes fighting over an
otherwise quiet (load < 0.1) cpu:

    nice    cpu user-mode time (out of 600 sec)

    0       354.09 sec      59.0 %
    2       120.83          20.1
    4       56.52           9.42
    6       38.15           6.36
    8       21.03           3.51
    10      5.07            0.845
    12      1.75            0.292
    14      0.38            0.063
    16      0.32            0.053
    18      0.20            0.033

    total   598.34 sec      99.72 %

when graphed, this is a fairly smooth curve with half
of the processes taking over 100 times their optimal
run-time to complete...

				--Blair
				  "Empires are for empiricists."

bill@twg.bc.ca (Bill Irwin) (05/22/91)

phil@eecs.nwu.edu (William LeFebvre) writes:

: Furthermore, if there is a CPU intensive or paging intensive or
: (I think even) an i/o intensive process running, interactive performance
: is *noticeably* degraded.  Get about 2 or 3 of them running and you
: have a system that crawls for the interactive users.  But, renice all
: three of those hog processes to even 4 and interactive use is once
: again snappy.

: So even on the VAX you had to put up with about 20 minutes of "hell"
: before the system would renice the offending process.

I have often wanted to be able to "nice" a running process.  Is
there any way to do this under SCO XENIX 2.3.3?

:               <phil@eecs.nwu.edu>
-- 
Bill Irwin    -       The Westrheim Group     -    Vancouver, BC, Canada
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
uunet!van-bc!twg!bill     (604) 431-9600 (voice) |     Your Computer  
bill@twg.bc.ca            (604) 430-4329 (fax)   |    Systems Partner

system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) (05/22/91)

>In article <4281@inews.intel.com> bhoughto@hopi.intel.com (Blair P. Houghton) writes:
>>In article <1991May16.140622.29266@alchemy.chem.utoronto.ca> system@alchemy.chem.utoronto.ca (System Admin (Mike Peterson)) writes:
>>>Our policy for nice/renice is:
>>>0  - "short" processes/jobs (compilations, or up to 5 minutes cpu time).
>>>1  - "medium" processes/jobs (up to 1 hour cpu time).
>>>2  - "long" processes/jobs (all other jobs).
>>
>>That's all but affectless.

The main problem with your discussion of BSD scheduling is that Apollo
Domain/OS does not use BSD scheduling - as I said in my posting, renice
values of 2 to 20 inclusive result in the same Domain/OS priority range,
so they are all exactly equivalent. Renice '1' is sufficiently
different that such a process will get almost zero cpu time if a renice
'0' process is running, and a renice '2'-'20' process will get almost no
time if a '0' or '1' process is running (always assuming cpu-bound).

I would love to have more choice of renice values that are really
different (I don't really care what they are, but 4, 8, 12, 16, 20 are
fine with me, or 5, 10, 15, 20); I don't really care if higher renice
value jobs don't get any cpu time if other jobs are running (though O/S
designer types might worry about it, and therefore let long jobs at
least do something every once in a while).

Mike.
-- 
Mike Peterson, System Administrator, U/Toronto Department of Chemistry
E-mail: system@alchemy.chem.utoronto.ca
Tel: (416) 978-7094                  Fax: (416) 978-8775