david@jpl-devvax.JPL.NASA.GOV (David E. Smyth) (02/16/90)
Can somebody please post the actual scheduling algorithm used by SunOS 4.0.3? And for System V just so we stay compatable. Why? There is some difference of opinion here about how effective "nice" is in controlling how much various processes execute. The SunOS documents are completely vague in the sveral places process priorities are discussed, such as nice(1), getpriority(2), and so on. In one place there is a statement that processes which use more CPU get a lower effective priority by the scheduler so that interactive processes are more responsive. Elsewhere is a comment that a priority of 19 means the process will never run unless no other process has anything to do. The actual algorithm should clear up this confusion. I'm very suprised its not in the documentation.
chris@mimsy.umd.edu (Chris Torek) (02/16/90)
In article <7085@jpl-devvax.JPL.NASA.GOV> david@jpl-devvax.JPL.NASA.GOV (David E. Smyth) writes: >Can somebody please post the actual scheduling algorithm used by SunOS >4.0.3? And for System V just so we stay compatable. Although these probably differ from the BSD algorithm (perhaps even significantly), the BSD scheduling algorithm is relevant as well. (Besides, it is the only one I am familiar with :-) .) In any case, it is not possible to post this algorithm as it consists of several hundred lines scattered all throughout a number of different files. It is hard to analyse, harder to predict, and though it has been tweaked and tuned until it seems to work fairly well, it probably ought not to be called an `algorithm' at all. >Why? There is some difference of opinion here about how effective >"nice" is in controlling how much various processes execute. Very little. :-) The short version of the BSD scheduler: A. CPU usage. The priority depends on CPU usage. Every second (modulo a special case for processes that are asleep, since many are and this saves time), p->p_cpu is updated as follows: /* NB: all these MIN and MAX calls are mainly to ensure variables stay in `char' ranges */ p->p_cpu = MAX(MIN((filter(avenrun[0]) * p->p_cpu) + p->p_nice, 255), 0); `filter' is a magic formula defined as follows: #define filter(loadav) ((2 * (loadav)) / (2 * (loadav) + 1)) The following comment is relevant: * We wish to decay away 90% of p_cpu in (5 * loadavg) seconds. * That is, the system wants to compute a value of decay such * that the following for loop: * for (i = 0; i < (5 * loadavg); i++) * p_cpu *= decay; * will compute * p_cpu *= 0.1; * for all values of loadavg: * * Mathematically this loop can be expressed by saying: * decay ** (5 * loadavg) ~= .1 * * The system computes decay as: * decay = (2 * loadavg) / (2 * loadavg + 1) * * We wish to prove that the system's computation of decay * will always fulfill the equation: * decay ** (5 * loadavg) ~= .1 * * If we compute b as: * b = 2 * loadavg * then * decay = b / (b + 1) * * We now need to prove two things: * 1) Given factor ** (5 * loadavg) ~= .1, prove factor == b/(b+1) * 2) Given b/(b+1) ** power ~= .1, prove power == (5 * loadavg) * * Facts: * For x close to zero, exp(x) =~ 1 + x, since * exp(x) = 0! + x**1/1! + x**2/2! + ... . * therefore exp(-1/b) =~ 1 - (1/b) = (b-1)/b. * For x close to zero, ln(1+x) =~ x, since * ln(1+x) = x - x**2/2 + x**3/3 - ... -1 < x < 1 * therefore ln(b/(b+1)) = ln(1 - 1/(b+1)) =~ -1/(b+1). * ln(.1) =~ -2.30 * * Proof of (1): * Solve (factor)**(power) =~ .1 given power (5*loadav): * solving for factor, * ln(factor) =~ (-2.30/5*loadav), or * factor =~ exp(-1/((5/2.30)*loadav) =~ exp(-1/(2*loadav)) = * exp(-1/b) =~ (b-1)/b =~ b/(b+1). QED * * Proof of (2): * Solve (factor)**(power) =~ .1 given factor == (b/(b+1)): * solving for power, * power*ln(b/(b+1)) =~ -2.30, or * power =~ 2.3 * (b + 1) = 4.6*loadav + 2.3 =~ 5*loadav. QED * * Actual power values for the implemented algorithm are as follows: * loadav: 1 2 3 4 * power: 5.68 10.32 14.94 19.55 B. The CPU usage factor grows whenever the clock hits while a process is running. The following comment comes from kern_clock.c: * We adjust the priority of the current process. * The priority of a process gets worse as it accumulates * CPU time. The cpu usage estimator (p_cpu) is increased here * and the formula for computing priorities (in kern_synch.c) * will compute a different value each time the p_cpu increases * by 4. The cpu usage estimator ramps up quite quickly when * the process is running (linearly), and decays away exponentially, * at a rate which is proportionally slower when the system is * busy. The basic principal is that the system will 90% forget * that a process used a lot of CPU time in 5*loadav seconds. * This causes the system to favor processes which haven't run * much recently, and to round-robin among other processes. This code, in essence, says p->p_cpu = MIN(p->p_cpu + 1, 255); C. Priority. After adjusting p->p_cpu (in both A and B), each process is assigned a `user priority'. This is calculated as p->p_usrpri = MIN(127, (p->p_cpu / 4) + PUSER + 2 * p->p_nice + (p->p_rssize > p->p_maxrss && freemem < desfree ? 2*4 : 0) ); The process p is scheduled to run as soon as the kernel returns to user code if p has a better p_usrpri priority than the `current' process. D. All kernel-mode processes have a better p_pri (< PZERO) than any user priority (>= PZERO), and the kernel runs them until they leave kernel mode (are ready to return to user code); only then will it choose the process with the best (lowest) p_usrpri. One notable problem with the above scheme is that no process is charged for CPU time if the clock does not tick while the process is running. Conversely, a process is charged for an entire clock tick if it happens to be the one running when the clock ticks. Since processes are rescheduled on clock ticks, this is often not serious; however, it is possible for a process to arrange to run for 9/10ths of a clock tick, immediately after each clock tick, hence get 90% of the CPU while being charged for 0% of the CPU. (Hint: see select(2) and setitimer(2).) -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@cs.umd.edu Path: uunet!mimsy!chris