speck@vlsi.caltech.edu (Don Speck) (02/18/86)
When I have to kill infinite-looping processes, I usually find them at nice 0, while much more deserving processes are at nice 4, getting nowhere. This is brought about by the formula that the 4.2bsd softclock() routine uses to determine when to "nice" a process (names simplified for clarity): if (uid != 0 && nice == NZERO && utime > 10*60) nice = NZERO+4; Only processes doing "real work" are penalized. Runaway sendmail's and syslog's go unbridled, since they are system overhead; likewise page-thrashers and processes looping on EOF go un-niced, since these use only system time, no user time (or nearly so). Processes already at nice 1 are left alone, so the malicious user can get an advantage over the "real work" jobs at nice 4 (and magnify that advantage simply by starting 20 infinite-loop jobs at nice 20, as one user demonstrated; the load average was so high that the nice 4 job got no runtime). No doubt the intention of this code was to improve keyboard response, but the side-effects are awful: no one can get any heavy-duty computing done. Thus it seems that the code ought to be removed, but if nothing gets nice'd, response gets pretty slow. One possibility would be to make the test unbiased: if ( utime+stime > 10*60 && nice >= NZERO && nice < NZERO+4) nice = NZERO+4; but now we're even more likely to zap long-lived login shells and editors; the cure is no better than the bug. What we really want is to favor processes that someone is waiting for, and disfavor "overnight" types of processes. The problem is, how do you tell which is which? This is what nice(1) is for, but we've tried waving a carrot (steeply declining cpu charges for nice'd processes) at the users, and they still don't use nice. Despite lack of mind-reading hardware, the system managers can often tell when a job is not being waited for: when the owner hasn't typed anything for a long time, we renice(8) his running processes unmercifully. Complaints, if any, relate more to perceived uneven application of this policy than to the policy itself. Unfortunately, this type of scheduling is crude and often too-little-too-late. I would prefer that the scheduler itself knew something about interactive processes, and gave them a higher percentage of the cpu. Interactive processes are distinguished by a very bursty pattern of cpu usage: compute, I/O wait, compute, I/O wait, .... i.e, frequent waitng for slow events, while processes that we want to disfavor do not wait for slow events. My proposal is a modification to kern_synch.c such that when a kernel process calls sleep() with pri > PZERO (waiting on a slow event), the associated user process is given a short-term boost, probably by forgiving some of p_cpu (recent cpu usage). Shells waking up after a command exits would get a boost. Processes waking up from a read() on a tty or pipe would get a boost if they did, indeed, have to wait; processes doing single-character reads would probably find typeahead waiting for them, not have to go to sleep, and not get the boost. Disk I/O and paging would NOT get a boost (they are not slow events). Does anyone see a way that this could be abused? I am reminded of VMS, which gives a short-term priority boost for doing I/O; doing single-character writes to a tty would make one's priority soar, so much that no one else could get any cpu. That wouldn't happen here, since tty output usually just stuffs another character in the output buffer, with no sleep required, hence no priority boost granted. Are there any other ways that this could go wrong? Will I cause lots of context switches from rapidly varying priorities? Will pipe readers and writers start preempting each other in a battle of priorities? Comment, analysis, suggestions, flames to: Don Speck seismo!cit-vax!speck or speck@vlsi.caltech.edu
mike@BRL.ARPA (Mike Muuss) (02/18/86)
A long time ago, for V6, I did a pretty snazzy scheduler that incorporated short-term and medium-term behavior into it's assessment of priorities. The result was (and still is) PDP-11/70 systems with nearly instant (<200ms) response for "simple" interactions. These days, I am firmly convinced that the proper way to implement this would be with a user-mode daemon that collected medium-term and long-term behavior patterns of processes, folded in issues such as time of day, number of people on, "demo in progress" flags, etc, and issued commands to the kernel adjusting various parameters that would control the kernel's short-term scheduling behavior, plus also providing per-process control (much like NICE currently, but not user changable, and separate). Creation of the necessary kernel mechanisms is nearly trivial. The user-mode "Scheduling Advisor" could want to grow into a full Expert System, but would probably be 90% satisfying if it just used a half a dozen heuristics, most of which I believe I can articulate. BRL currently has a low-level effort to implement this type of mechanism, with two of our GITs (Gurus-in-training) working the problem. If you would like to contribute ideas, etc, please send them to <Jcst@BRL> (Joint CS Team). If we achieve any kind of success, we will notify the Unix-Wizards list, so please don't just send "tell me when you are done" messages. Best, -Mike Muuss
thomas@utah-gr.UUCP (Spencer W. Thomas) (02/19/86)
We have disabled this feature here in the past, since we have a tendency to start an emacs in the morning and use it all day. With subprocesses throwing output into windows as fast as they can, it's really easy to get more than 10 minutes of CPU in a day of work. All of a sudden your response slows to crawl: "D**m it, my emacs just got auto-niced". I also experimented with code that looked at the ratio between system & user time. This prevented auto-nicing of emacs, I'm really not sure how effective it was otherwise. It certainly wouldn't have slowed down a run away sendmail (which seems to be your problem). The long-crunching jobs would eventually get niced, which they should be anyway. In fact, we ask people to run jobs that are going to take a LOOONNGGG time to run them at nice 19. This can backfire if a process with a large RSS (bigger than the physical memory) is running -- you get a very nasty swapping/thrashing behaviour. Some constant about how often a process can be swapped is too small. -- =Spencer ({ihnp4,decvax}!utah-cs!thomas, thomas@utah-cs.ARPA)
sandel@milano.UUCP (02/19/86)
I have implemented auto-renicing as part of the "assassin" daemon which kills idle logins. The renicing actions are configurable as to threshhold, amount of nice, etc. It is also configurable to re-nice procs only during certain times of the day and on certain days. If anyone is interested, please let me know. Currently, the assassin only runs on VAXen, but it shouldn't be too difficult to make it run on other machines. -- Charles Sandel arpa: sandel@mcc.arpa uucp: *!ut-sally!im4u!milano!sandel (or *!ut-sally!charles) An endangered species: native Austinite
boyd@inset.UUCP (Boyd Roberts) (02/20/86)
In article <1014@brl-smoke.ARPA> speck@vlsi.caltech.edu writes: > > When I have to kill infinite-looping processes, I usually find >them at nice 0, while much more deserving processes are at nice 4, >getting nowhere... > > if (uid != 0 && nice == NZERO && utime > 10*60) > nice = NZERO+4; > >Only processes doing "real work" are penalized. Runaway sendmail's >and syslog's go unbridled, since they are system overhead; likewise >page-thrashers and processes looping on EOF go un-niced, since these >use only system time, no user time (or nearly so). Processes already >at nice 1 are left alone, so the malicious user can get an advantage >over the "real work" jobs at nice 4 (and magnify that advantage simply >by starting 20 infinite-loop jobs at nice 20, as one user demonstrated; >the load average was so high that the nice 4 job got no runtime). It's good to see the "hack it till it works" approach is alive and well. Given that the Berkeley paging code was written that way, it must be right. We've also got some really keen heuristics: "real work" == large user time So, from them we can make a few rash generalisations. With these we can code up some unworkable mess. Some fairly viable work has been done on SHARE scheduling on UNIX in Australia. You should check it out. It actually uses some algorithms, and was even designed. "Don't diddle the code. Choose a better algorithm." Boyd Roberts +++ + ..!mcvax!ukc!inset!boyd + boyd@inset.co.uk + boyd@basser.oz + +++ "Isn't that kind of severe?"
john@frog.UUCP (John Woods, Software) (02/20/86)
> > When I have to kill infinite-looping processes, I usually find > them at nice 0, while much more deserving processes are at nice 4, > getting nowhere... > What we really want is to favor processes that someone is > waiting for, and disfavor "overnight" types of processes... > I would prefer that the scheduler itself knew something > about interactive processes, and gave them a higher percentage > of the cpu.... > My proposal is a modification to kern_synch.c such that when > a kernel process calls sleep() with pri > PZERO (waiting on a > slow event), the associated user process is given a short-term > boost, probably by forgiving some of p_cpu (recent cpu usage).... Well, here is (roughly) the scheme UNOS uses. Processes have three numbers for priority, their ceiling, floor, and current values (makes sense). The following things happen to the current priority value: when you are stuck in the ready queue for longer than a certain time, it is incremented some amount; when you use up your time quantum*, it is decremented. *(The time quantum is a function of the priority, with low priority numbers getting long quanta, thus allowing compute jobs to get lots of work done as long as no high priority jobs hit the ready queue, which will interrupt the low priority job, whether or not its quantum is up). Additionally, certain interactions cause priority boosts, and among them are blocked teletype read or write, pipe read or write (this boost is currently disabled, since it had some disagreeable results; try it if you want, but I think you won't like it), and a tunable credit for disk interaction (0 by default in our boot-time configuration file, for similar reasons). In general, the boost for tty interaction works quite nicely, with the slightly disagreeable result that uucp tends to become quite a hog (not surprising, it is a hog anywhere it lives...); this could be eased by dropping its ceiling a few notches. -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, ...!mit-eddie!jfw, jfw%mit-ccc@MIT-XX.ARPA This space dedicated to Challenger and her crew, Francis R. Scobee, Michael J. Smith, Ellison S. Onizuka, Judith Resnik, Ronald E. McNair, Gregory B. Jarvis, and Christa McAuliffe. "...and slipped the surly bonds of Earth to touch the face of God."
j@utah-cs.UUCP (J Lepreau) (02/26/86)
In the Denver Usenix proceedings is a paper by Jeffrey Straathof et al of the Univ of Maryland which describes a new scheduler for BSD Unix which they claim is both much superior and much smaller than the 4.2 or 4.3 ones. I haven't yet read it, and didn't hear the talk. The address is Dept of Computer Science, Univ of Maryland, College Park, MD 20742. Now why haven't we heard from Chris Torek about this?
henry@utzoo.UUCP (Henry Spencer) (02/26/86)
> Some fairly viable work has been done on SHARE scheduling on UNIX > in Australia. You should check it out. It actually uses some > algorithms, and was even designed. Sounds very interesting. How about some >>references<<? To published papers, not unobtainable internal tech reports! -- Henry Spencer @ U of Toronto Zoology {allegra,ihnp4,linus,decvax}!utzoo!henry
chris@umcp-cs.UUCP (Chris Torek) (02/27/86)
In article <3697@utah-cs.UUCP> j@utah-cs.UUCP (Jay Lepreau) writes: >In the Denver Usenix proceedings is a paper by Jeffrey Straathof et al >of the Univ of Maryland which describes a new scheduler for BSD Unix >[...] Now why haven't we heard from Chris Torek about this? Because I am not involved in the project. I would rather let Jeff speak for himself. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
jeff@umcp-cs.UUCP (Jeff Straathof) (02/27/86)
In article <3375@umcp-cs.UUCP> chris@umcp-cs.UUCP (Chris Torek) writes: >... I would rather let Jeff speak for himself. Here I speak. Let me give a brief description of how Maryland's new UNIX scheduler works and then tell what I've done with it since the USENIX proceedings paper. The new scheduler employs a multilevel feedback run queue with roundrobin scheduling in each level. Quantum sizes vary with each level and begin when processes get control of the cpu. Processes get preempted by higher-priority processes leaving blocked states. The priority of a process directly determines the level of the run queue in which it should reside; the one with the highest priority gets serviced first. The priority of a process is determined by what it does. Priority boosts are given for socket input, terminal input (specific to mode) and disk input, among other minor things. Priority bumps are given for quantum expirations. Each process has a priority maximum and priority minimum limiting the range of its priority; its values are inherited from its parent. The quantum sizes, boost amounts, and bump amount can be tuned per configuration. The priority maximum and priority minimum of any running process can be changed, thus providing external resource control. The new scheduler has not been subjected to any rigorous performance testing. It's code is much cleaner than that of the previous scheduler so its throughput should be better. It is clearly obvious to users of a heavily loaded machine that the scheduler distinguishes between interactive and cpu-bound processes extremely well. As a matter of fact, I am writing this on a heavily loaded machine not running my scheduler and am getting pretty annoyed. The persons running their troffs of course don't have the same feelings. Since the last USENIX conference, I've fixed up some of the code and prepared it for distribution. I even dug up our old 4.2 code and reinstalled the scheduler in that. To tune my scheduler and to find out the real performance differences between it and the standard scheduler, I have developed a pretty snazzy remote terminal emulator to run some benchmarking. A description of the emulator and the results of the testing will hopefully make it to the next USENIX conference. The scheduler code will be available very soon for both 4.2 and beta 4.3 BSD. It's great for those who don't think the standard scheduler is what they need. It's a must for those doing their own scheduling work who want to avoid learning from scratch. Send me mail if you want more information. -- Spoken: Jeff Straathof ARPA: jeff@mimsy.umd.edu Phone: +1-301-454-7690 CSNet: jeff@umcp-cs UUCP: seismo!umcp-cs!jeff USPS: Comp Sci Dept, Lab for Parallel Comp, Univ of Md, College Park, MD 20742
guy@sun.uucp (Guy Harris) (03/01/86)
> Priority boosts are given for socket input, terminal input (specific to > mode)... Unfortunately, the terminal mode doesn't tell you as much as you might think. The distinction between screen editors and programs like "more" is *not* that the former run in RAW mode and the latter run in CBREAK mode. The bulk of "full-screen" programs run in CBREAK mode on systems running V7-descended terminal drivers; there is no way to distinguish between "vi" and "more" simply on the basis of the mode it puts the terminal into. (EMACS may put the terminal into RAW mode, but that's because it was the only way to get an 8-bit data path to use with terminals with META keys that turn the 8th bit on. If it merely wanted to disable XON/XOFF flow control, it could have set the start and stop characters to '\377'.) It would also be interesting to see what effect treating programs running in RAW mode as screen editors would have on programs like "uucico" (which are the sort of programs RAW mode was intended for - programs which want a raw 8-bit data path for data transfer, not screen editors). Furthermore, the System V driver has a cleaner "ioctl" interface; you don't have all-inclusive mode bits like RAW and CBREAK, you have particular processing operations which you turn on and off. It's not clear how you'd distinguish between RAW and CBREAK mode in this context. (The scheduler work in that paper is not just of interest to 4.[23]BSD users; all UNIX variants since V7 use similar schedulers with similar deficiencies.) It would be better to attempt to come up with a classification of processes as "extremely interactive", "very interactive", and "interactive" based on some measure independent of the details of the terminal driver's "ioctl"s, rather than an *ad hoc* classification based on the mode the author of the program the process is running chose to use. (The current scheme gives an incentive to rewrite EMACS to use CBREAK mode wherever possible, since you get a bigger priority boost when a read completes than you do in RAW mode.) A first cut at this might be based on the real time, or process virtual time, interval between blocking reads from the terminal, with longer intervals giving rise to greater priority boosts. This seems to match the intent of the RAW/CBREAK/cooked split, as described in the paper: "extremely interactive" processes, which do only a little work between keystrokes (e.g. a screen editor updating the current line) get small boosts; "very interactive" processes, which do more work between keystrokes (e.g. "more", or "ed", or even "vi" or EMACS, when used as a file browser) getting larger boosts; and "interactive" processes, which do large amounts of work between keystrokes (e.g., a shell) getting still larger boosts. (Note that a process can provide a different sort of interactive response at different times, depending on what sort of work it's doing in response to keystrokes; a screen editor can get different priority boosts depending on whether the user is typing text in or doing an interactive global search and replace.) This area of the new scheduler needs more work. The requirements weren't really stated; what was the rationale for the three-way classification of processes, and why give the most interactive processes the least boost? (For instance, is it intended that an interactive process should always get placed at approximately the same priority relative to its "priority-max" when it wakes up from a terminal I/O wait, so that a process which does little work between terminal I/O waits needs a smaller boost than a process which does a fair bit of work, since it has had fewer quantum expirations and has had its priority lowered fewer times?) The choices sound reasonable, but I'm curious why those were the choices made. Also, is there a danger of starvation with this scheduler? Could several people banging away at screen editors keep their process' priorities at their "priority-max", with the risk that there could always be at least one of those processes eligible to run, thus locking out all other processes? On a large system with many users logged in, it seems possible that enough users could be typing at any given time that one might always be eligible to run. This might actually be what is desired, but if there are situations where it is not desired some mechanism (perhaps some form of "aging") will have to be provided to prevent it. -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.arpa (yes, really)
jeff@umcp-cs.UUCP (Jeff Straathof) (03/03/86)
In article <3303@sun.uucp> guy@sun.UUCP writes: >... there is no way to distinguish between "vi" >and "more" simply on the basis of the mode it puts the terminal into. True that they both operate in CBREAK mode, but "vi" does a lot more input than "more" does, that's how the difference in priorities arises. >(EMACS may put the terminal into RAW mode, but that's because ... Good point about EMACS. Dividing interactive processes on the basis of mode was just a choice. With the current implementation of the scheduler, it is simple to disregard the mode by making the various boost values for terminal input equal. >It would also be interesting to see what effect treating programs running in >RAW mode as screen editors would have on programs like "uucico"... "Uucico" can definitely be a problem. What will probably be required of it is a voluntary call to setminmax() to lower its priority. I haven't done it yet but probably will. Thanks to Chris Torek for originally pointing it out to me and thanks to Guy for pointing it out to everyone else. >Furthermore, the System V driver has a cleaner "ioctl" interface; you don't >have all-inclusive mode bits like RAW and CBREAK, you have particular >processing operations which you turn on and off. It's not clear how you'd >distinguish between RAW and CBREAK mode in this context. I have never worked with System V so don't know the answer. I might not know the answer even if I had worked with it. The key with this scheduler is that some of the boosting has been moved to the terminal driver. The implementation makes it easy to change the criteria for receiving terminal input boosts. I considered several different ones and chose just one to implement. I've got others in mind though. >(The scheduler >work in that paper is not just of interest to 4.[23]BSD users; all UNIX >variants since V7 use similar schedulers with similar deficiencies.) Glad to here others agree. >It would be better to attempt to come up with a classification of processes >as "extremely interactive", "very interactive", and "interactive" based on >some measure independent of the details of the terminal driver's "ioctl"s, >rather than an *ad hoc* classification based on the mode the author of the >program the process is running chose to use. Perhaps it would, but remember that whatever the mode the author chooses he gets no boost unless he does the input. That fact is more important than the ad-hoc mode classifications. >(The current scheme gives an >incentive to rewrite EMACS to use CBREAK mode wherever possible, since you >get a bigger priority boost when a read completes than you do in RAW mode.) Any full screen editor, regardless of the mode in which it operates, does so much input that it stays in the top levels of the run queue in the current implementation. >A first cut at this might be based on the real time, or process virtual >time, interval between blocking reads from the terminal, with longer >intervals giving rise to greater priority boosts. This seems to match the >intent of the RAW/CBREAK/cooked split, as described in the paper... Not really. A scheduler-buster could then compile the file between inputs, something we wanted to avoid. >... "extremely >interactive" processes, which do only a little work between keystrokes (e.g. >a screen editor updating the current line) get small boosts; "very >interactive" processes, which do more work between keystrokes (e.g. "more", >or "ed", or even "vi" or EMACS, when used as a file browser) getting larger >boosts; and "interactive" processes, which do large amounts of work between >keystrokes (e.g., a shell) getting still larger boosts. (Note that a >process can provide a different sort of interactive response at different >times, depending on what sort of work it's doing in response to keystrokes; >a screen editor can get different priority boosts depending on whether the >user is typing text in or doing an interactive global search and replace.) Once an editor begins to do a global search-and-replace, it becomes less of an interactive process by my classifications. The other editor users should get quicker service if they are just doing inserts when you are doing your global replace. You should still get much quicker service than the remaining processes because you started out in that top level and won't drop many levels before requiring input again. For all of this to work, the scheduler should be tuned properly. Boosts for "extremely" interactive processes are small because they get so many of them. It probably doesn't matter what the amount is as long as it is bigger than 0. Boosts for the "very" interactive processes that do things like file browsing and news reading are larger because fewer of them occur and more cpu service is required to get the next batch (perhaps a screenful) of output prepared. If they didn't get the larger boost, they might fall into the lower levels with the compilers and not give the user the small response time for which we were looking. Boosts for interactive processes like shells are larger to give the command it might fork a chance to show its interactive nature. If the child, or the process itself, isn't interactive it will drop through the levels quickly. My intentions might be met by just giving an equal boost for any type of terminal input and letting the quantum sizes to the rest. >This area of the new scheduler needs more work. The requirements weren't >really stated; what was the rationale for the three-way classification of >processes, ... I just wanted to try to properly assign priorities among the interactive processes themselves as the paper stated. There are several different methods of doing this each with tradeoffs between effectiveness and cost. I chose the 'mode method' because I thought it would be fairly effective and because it would incur very little cost. >...and why give the most interactive processes the least boost? >(For instance, is it intended that an interactive process should always get >placed at approximately the same priority relative to its "priority-max" >when it wakes up from a terminal I/O wait, so that a process which does >little work between terminal I/O waits needs a smaller boost than a process >which does a fair bit of work, since it has had fewer quantum expirations >and has had its priority lowered fewer times?) The choices sound >reasonable, but I'm curious why those were the choices made. Correct. Perhaps my explanation of the differing boost amount should go here. Sorry about leaving all of it out of the paper. >...is there a danger of starvation (by many editors) with this scheduler... Definitely. We thought about it beforehand but didn't know the answer. We firmly believed the effort had enough potential merit to at least try out. The scheduler does maintain a small and stable response time for interactive processes regardless of system load (determined by use only). Obviously something has to give during peak hours, and it's the troffs and compilers that do. Though they won't get starved if the machine is powerful enough, they will if it isn't. >...(having editors lock out others)... might actually be what is desired... It was what was desired by us. The goals of the new scheduler placed interactive responsiveness as the most important. Not having that particular responsiveness during peak hours was a major motivation for the research. I suspect that users will learn about the best times to compile and about the best times to look their code over twice before compiling. As it stood before, the best time to compile was always now! That policy was getting quite aggrivating. >... but if there are situations where it is not desired some mechanism >(perhaps some form of "aging") will have to be provided to prevent it. If you want to avoid that situation, tune the scheduler that way. Make the priority boosts smaller for terminal input, make the quantum sizes at the top of the queue smaller, and make the quantum expiration bump larger. All of that might not work if you have so many editors running, but then your machine might not be powerful enough to handle the workload people are placing on it. Interactive responsiveness was our major goal, and avoiding a form of "aging" in BSD was what provoked this good discussion some time ago. The design of the scheduler is quite simple. I imagine that it is discussed in most introductory OS books. The simplicity unvails cases like uucico that break the intentions, but the simplicity also provides opportunities for experimentation and success. Many ideas people have about scheduling can be tried out with simple modifications to this implementation. One I will try next is that of the "base+boost" method. Good tools for measuring the performance of such schedulers are a must, and they are what I am working on now (along with preparing everything for distribution, argh). Once I get everything done, the results will be made available. Thanks to Guy Harris for his comments. -- Spoken: Jeff Straathof ARPA: jeff@mimsy.umd.edu Phone: +1-301-454-7690 CSNet: jeff@umcp-cs UUCP: seismo!umcp-cs!jeff USPS: Comp Sci Dept, Lab for Parallel Comp, Univ of Md, College Park, MD 20742
guy@sun.uucp (Guy Harris) (03/03/86)
> Some fairly viable work has been done on SHARE scheduling on UNIX > in Australia. You should check it out. It actually uses some > algorithms, and was even designed. Given that I don't speak English or Australian, just whatever we speak here in the USofA (no, it's not American, considering Canadians don't speak exactly the same language either, and they're (North) Americans as well), what do you mean by "algorithms" here? Over here, we tend to think "algorithm" as meaning any procedure of the sort executed by a computer, whether it's well thought-out or specified or not. You may think of auto-nicing as a hack (I certainly do), but by my definition the procedure that implements it certainly qualifies as an algorithm.... -- Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.arpa (yes, really)
root@topaz.RUTGERS.EDU (Charles Root) (03/03/86)
You might be interested in our experience with our DEC-20's. The default DEC-20 scheduler sounds a lot like what you have done. In particular, it has heuristic boosts for terminal input, and other things designed to favor interactive jobs. Like many other universities, we have a heavily loaded timesharing system used for students. What we found was that under heavy load Emacs ran just great, but the first time you tried to compile, you might as well go out for dinner (multi-course, in your favorite New York restaurant). Since all of our students eventually did a compilation, the results were a disaster. We finally ended up using the "class scheduler". This is a scheduler that keeps track of what average share of the CPU is going to each user. It gives boosts or penalties depending upon whether you are getting more or less than your fair share. Emacs ran a bit slower (though there was still some interactive boost), but people got their work done. Eventually, we ended up moving our two research machines to the same scheduler, because we started running into the same thing there, too. I concluded that what most people really expect to get out of a timesharing system is 1/N of the machine, where N is the load average.
chris@umcp-cs.UUCP (Chris Torek) (03/03/86)
Well, I guess it is time for me to step into the fray at last. In article <4516@topaz.RUTGERS.EDU> root@topaz.UUCP (probably Chuck Hedrick) writes: >... The default DEC-20 scheduler sounds a lot like what [Jeff et >al.] have done. ... we found was that under heavy load Emacs ran >just great, but the first time you tried to compile, you might as >well go out for dinner.... Funny thing, just a few hours ago I was talking to one of the local grad student hackers here, and mentioned that I was afraid that we might start to see the same thing. >We finally ended up using the "class scheduler". [It] keeps track >of what average share of the CPU is going to each user. It gives >boosts or penalties depending upon whether you are getting more or >less than your fair share. > >I concluded that what most people really expect to get out of a >timesharing system is 1/N of the machine, where N is the load >average. That was what I said *I* wanted, at any rate. If the load is 10, I expect to be able to get ~1/10th of the machine myself. If I choose to spend it on compilations, I *still* expect to get `my' share of the CPU power of the machine. Of course, if I start editing at the same time, I would rather my editor get most of my 1/10th, not my compile, which indicates that event-based scheduling is still a good idea. -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu
wagner@utcs.uucp (Michael Wagner) (03/06/86)
I've just been catching up on unix-wizards, which I haven't looked at in a loooong time, and I found this discussion of 4.2 scheduling. Interestingly, the Maryland stuff sounds *very* close to the (rather heuristic) scheduler in CP (the 'monitor', if you will) of VM. It (basically) has: 1) a starting priority for every user 2) no explicit upper/lower limits per user, but certain other constructs generate something like a probability cloud around that starting priority 3) a number of queues that users reside on. Aside from the ones that designate resource unavailability (notably storage overcommitment), there are basically 3 queues for 'runable' users, called (with typical lack of imagination) Q1, Q2 and Q3. They correspond to very interactive, somewhat interactive, and batch (by batch I mean it runs to completion once started, not that you send it off elsewhere). In the HPO (High Performance Option) variant of the product, the issue is clouded (but vastly improved, I'm told...I'll shortly be able to report on this) by the splitting of each of these into waiting-for-CPU-dispatch and waiting-for-other-quick-services (page fault, etc) queues. They are informally called the real-run-list and the run-list, respectively. I can't recall the formal names right now. Interrupts move users between the run-list and the real-run-list, and only the real-run-list need be scanned on most redispatches. 4) A transaction can be loosely defined as the time between pressing the enter key, and getting the last of the results. (for those who don't know, S/370 moves terminal interaction (irrevocably) into front-end hardware, so there is no raw-mode equivalent. The entire input is presented to the mainframe in one interrupt when it is completed. Completion is signalled by a special key (enter). Some (limited) local editing is possible before hitting enter.) When a transaction is started, it is given a 'deadline' based on the basic priority of the user, some acknowledgement of the load average (called the expansion factor in CP), and an initial (recent-history-based, I think) characterization of the transaction. After that, the run lists are sorted and serviced in 'deadline' order. This effectively prevents indefinite postponement (but, as pointed out in earlier postings, indefinite postponement strategies are rendered ineffective when faced with inadequate hardware). 5) While a transaction is 'runable', it is thrown into the pool and gets time-slices in rough proportion to it's deadline priority. The short- term scheduler strategy is a modified round-robin (I think someone once called it a round-turkey :-) ). If a transaction sucks up enough resource, the scheduler decides it was wrong to characterize this transaction as interactive, and it moves it to Q2 (which also has the effect of recalculating it's deadline, because it has now shown itself to be less interactive). A similar shift can occur Q2->Q3. Living in 'lower' queues has the effect of getting you larger slices of resource, but less frequently. There's more, but you get the point (and this is getting long). One of the things I found amazing (coming from a background with more sophisticated schedulers) was how well this all works, in the face of tremendous loads. We used to run 3 large UTS systems (a 370 UNIX-like system) and a smattering of little CMS users (mostly development and maintenance) on our system. Even when the machine ran flat out, editing response for CMS users was uneffected by the load. (There were other problems, but they basically weren't CP's fault, and are really another story). As another example, we recently disconnected one channel path to DASD (in order to provide a test path for a new system being build concurrently with production). That cuts our access to disk in half. No one seems to notice in their editing response. I can see it in the performance reports, but it's minimal. I can also now flog the system with artificially constructed workloads and actually effect other users (something I basically couldn't do when both channels were connected). I think we're seeing the effect of inadequate hardware there, though, and not bad scheduling decisions. One place where this all falls down is that the scheduler basically makes only CPU dispatching decisions. It has no influence on I/O queue decisions, nor paging queue decisions. This all works if you have overdesigned your I/O configuration relative to the rest of the machine, but would fail badly otherwise. This is relatively easy to do with 370 arch. I/O gear, but (seemingly) much harder on VAXEN. I'm curious how this is compensated for in scheduling for 4.2 Well, enough for now. I don't know how interested people on this list are in hearing about work being done on other systems (especially blue ones :-)). Michael Wagner (wagner@utcs)
jbuck@epimass.UUCP (Joe Buck) (03/08/86)
In article <3311@sun.uucp> guy@sun.uucp (Guy Harris) writes: >> Some fairly viable work has been done on SHARE scheduling on UNIX >> in Australia. You should check it out. It actually uses some >> algorithms, and was even designed. >what do you mean by "algorithms" here? Over here, we tend to think >"algorithm" as meaning any procedure of the sort executed by a computer, >whether it's well thought-out or specified or not. You may think of >auto-nicing as a hack (I certainly do), but by my definition the procedure >that implements it certainly qualifies as an algorithm.... I don't believe it! I, a mere mortal, get to correct Guy Harris :-). Though you frequently see the word "algorithm" associated with schedulers, they are not because they don't terminate (at least they aren't supposed to). Knuth (who is "over here") lists the following properties of algorithms: Finiteness: always terminates in a finite number of steps Definiteness: each step is precisely defined Input: zero or more inputs Output: one or more outputs Effectiveness: "... all of the operations to be performed ... can in principle be done exactly and in a finite length of time ... using pencil and paper" Knuth uses the term "computational method" to describe things that lack finiteness. Other writers besides Knuth also give finiteness as a criterion. No one seems to require that the algorithm be particularly efficient or well-designed. But I'd go along with the original writer (I don't know who he was) up to a point: OSes that have been hacked until they work, sort of, may lack definiteness. Sure, there is a definite sequence of instructions being executed, but who knows what it is. Of course, the function that decides the next process to be executed at a given time is (I hope!) an algorithm. -- - Joe Buck <ihnp4!pesnta!epimass!jbuck>
sbs@valid.UUCP (Steven Brian McKechnie Sargent) (03/09/86)
> > Some fairly viable work has been done on SHARE scheduling on UNIX > > in Australia. You should check it out. It actually uses some > > algorithms, and was even designed. > > Given that I don't speak English or Australian, just whatever we speak here > in the USofA (no, it's not American, considering Canadians don't speak > exactly the same language either, and they're (North) Americans as well), > what do you mean by "algorithms" here? Over here, we tend to think > "algorithm" as meaning any procedure of the sort executed by a computer, > whether it's well thought-out or specified or not. You may think of > auto-nicing as a hack (I certainly do), but by my definition the procedure > that implements it certainly qualifies as an algorithm.... > -- > Guy Harris > {ihnp4, decvax, seismo, decwrl, ...}!sun!guy > guy@sun.arpa (yes, really) *** REPLACE THIS LINE WITH YOUR MESSAGE *** You twit.
jack@boring.uucp (Jack Jansen) (03/09/86)
This discussion brings a question to mind: Has anyone ever tried to do I/O scheduling? It seems to me that it would be nice I/O for interactive processes (reading editor buffers, paging) be done at a higher priority. Did anyone do any experiments in this direction? If so, what were the results? -- Jack Jansen, jack@mcvax.UUCP The shell is my oyster.
boyd@inset.UUCP (Boyd Roberts) (03/10/86)
>> Some fairly viable work has been done on SHARE scheduling on UNIX >> in Australia. You should check it out. It actually uses some >> algorithms, and was even designed. > >Given that I don't speak English or Australian, just whatever we speak here >in the USofA (no, it's not American, considering Canadians don't speak >exactly the same language either, and they're (North) Americans as well), >what do you mean by "algorithms" here? By "algorithm" I mean something that has been designed. It was not used in the definitive sense. Or, Hack != Algorithm Yes, there is a paper. And, there should be a new one. Ask "piers@basser.oz". A "hack" is never a "solution", it's a "rabid mess". Boyd Roberts +++ + ..!mcvax!ukc!inset!boyd + boyd@inset.co.uk + boyd@basser.oz + +++ "Isn't that kind of severe?"
wagner@utcs.uucp (Michael Wagner) (03/10/86)
In article <172@epimass.UUCP> jbuck@epimass.UUCP (Joe Buck) writes: >I don't believe it! I, a mere mortal, get to correct Guy Harris :-). I don't think so. At least not this time. >Though you frequently see the word "algorithm" associated with schedulers, >they are not because they don't terminate (at least they aren't supposed >to). I certainly think schedulers terminate. There might be some confusion about either what a scheduler is or what time scale to examine their behaviour on. Schedulers that I am familiar with are entered to decide who to run next, and return (i.e. terminate) with a recommendation. If it were otherwise, you'd never get any useful work done. The scheduler does, usually, keep some sort of historical data around to assist in it's decisions, but that doesn't change the fact that it doesn't run all the time. > >Of course, the function that decides the next process to be executed at >a given time is (I hope!) an algorithm. >-- >- Joe Buck <ihnp4!pesnta!epimass!jbuck> The function that decides the next process to be executed at a given time is (I hope) a scheduler! Michael Wagner (wagner@utcs)