lgy@blake.acs.washington.edu (Laurence Yaffe) (08/22/89)
After all the talk in comp.arch about cache and memory system design at the hardware level, how about some discussion on higher level issues relating to memory utilization, process scheduling, or resource contention? A lot of current machines, such as my MIPS box, have pretty standard virtual memory systems: demand paging is used to allocate real memory to individual processes, with some sort of least-recently-used algorithm employed to select pages to write out (or forget) when virtual memory use exceeds real memory. This seems to work reasonably well if your typical job mix consists of numerous small processes. But severe, and unnecessary, performance degradation can occur if you have several processes which have very poor locality of reference when accessing their data, are sufficiently small that any singly process can fit in real memory, and yet sufficiently large that no two processes will fit in real memory. As a concrete example (typical of what I've been trying to run lately), suppose you have a 32 Mbyte system and that there are only two processes running, each of which uses 25 Mbytes of virtual memory (almost all data space) with rapid, random access patterns. (I.e., normal behavior for a program like Mathematica.) If only one job were present, it would fit in real memory and the machine would run at virtually 100% cpu utilization. With two such jobs in a demand-paged environment what appears to happen is that each process typically gets enough real memory so that it denies the other process sufficient memory to run effectively - i.e., both processes continually page fault and cpu utilization plumments. This sort of contention could obviously be prevented by a smarter scheduling technique - something which would entirely swap out one process for a period of time in order to let the other process run. Questions: Which (if any) high-performance Unix systems are capable of avoiding this type of inter-process contention? What works best - actual process swapping layered on top of demand paging, better paging algorithms, clever scheduling based on adaptive process priorities, ... ? How common is this type of problem? Anyone else share my feeling that many Unix systems could do a lot better at handling large processes? Anyone else share my feeling that this is important (and sadly neglected)? -- Laurence G. Yaffe Internet: lgy@newton.phys.washington.edu University of Washington Bitnet: yaffe@uwaphast.bitnet
henry@utzoo.uucp (Henry Spencer) (08/22/89)
In article <3332@blake.acs.washington.edu> lgy@newton.phys.washington.edu (Laurence Yaffe) writes: >suppose you have a 32 Mbyte system and that there are only two processes >running, each of which uses 25 Mbytes of virtual memory (almost all data space) >with rapid, random access patterns... > This sort of contention could obviously be prevented by a smarter >scheduling technique - something which would entirely swap out one >process for a period of time in order to let the other process run. Have you ever figured out how long it takes to swap out a 25MB process? Or to swap it back in again? The fact is, this example system simply does not have enough main memory for its workload. The fix is, either double the memory or run the jobs in sequence rather than simultaneously. "Real memory for real performance." -- Mike Tilson -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
lgy@blake.acs.washington.edu (Laurence Yaffe) (08/23/89)
In article <1989Aug22.163100.25540@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <3332@blake.acs.washington.edu> lgy@newton.phys.washington.edu (Laurence Yaffe) writes: >>suppose you have a 32 Mbyte system and that there are only two processes >>running, each of which uses 25 Mbytes of virtual memory (almost all data space) >>with rapid, random access patterns... >> This sort of contention could obviously be prevented by a smarter >>scheduling technique - something which would entirely swap out one >>process for a period of time in order to let the other process run. > >Have you ever figured out how long it takes to swap out a 25MB process? >Or to swap it back in again? The fact is, this example system simply >does not have enough main memory for its workload. The fix is, either >double the memory or run the jobs in sequence rather than simultaneously. > >V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology >1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu This response ("buy more memory, or sequence jobs by hand"), while common, completely misses the point. The real issue here is how to maximize overall performance with a given physical configuration. The problem I raised is "scale invariant" - no matter how much memory one has, there will always be important jobs which fill all of memory. A bigger memory just means that I'll try to run bigger jobs. This is why I deliberately phrased the initial description of this problem in terms of ratios of process size to physical memory. Part of intended purpose of my posting was to emphasize that system performance in the presence of large processes (large relative to however much memory you can afford, but not so large that the system shouldn't be able to run them efficiently) IS important! As to your specific question - yes, I have considered how long it takes to swap out a 25 Mb process. On my system, perhaps 10-20 seconds. This is utterly negligible for jobs which require many cpu-hours or cpu-days. The real issue here is also "scale invariant": as process size grows certainly the time to swap the entire process in or out grows. But so does the time required to page the same amount of data in from disk and, I submit, so does the typical cpu time. I see no reason to frown on swapping strategies provided that the frequency of process swaps is intelligently adjusted as a function of process size. Finally, while running multiple large jobs in sequence is a valid suggestion and will certainly avoid inter-process contention, to me this is an admission of failure. (Why do we like Unix? Multiprocessing?) If a single person is starting multiple large jobs, then scheduling by hand may be acceptable. If different users are trying to run big jobs then strict sequential scheduling is a real pain. One should be able to do better. While I may have high expectations, I do think that minimizing the type of contention I discussed is a reasonable user desire and should be a relevant operating system design goal. -- Laurence G. Yaffe Internet: lgy@newton.phys.washington.edu University of Washington Bitnet: yaffe@uwaphast.bitnet
frazier@oahu.cs.ucla.edu (Greg Frazier) (08/23/89)
In article <1989Aug22.163100.25540@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: =>In article <3332@blake.acs.washington.edu> lgy@newton.phys.washington.edu (Laurence Yaffe) writes: =>>suppose you have a 32 Mbyte system and that there are only two processes =>>running, each of which uses 25 Mbytes of virtual memory (almost all data space) => =>Have you ever figured out how long it takes to swap out a 25MB process? =>Or to swap it back in again? The fact is, this example system simply =>does not have enough main memory for its workload. The fix is, either =>double the memory or run the jobs in sequence rather than simultaneously. The original posting was asking if there was an OS smart enough to run the jobs in sequence for him. That's the point - most machines experience a wide range of workloads, and cannot be customized for each one of them. Any heavily-used scientific machine is going to experience the described situation at some point in its career. Now, obviously, if you have somebody monitoring the machine's performance, he can "manually" prevent the two jobs from running simultaneously. What would be more desireable, however, is for the OS to realize what is going on, and for _it_ to cause the jobs to run sequentially. I suspect that this would require too much intelligence on the part of the OS, but then, what do I know - I'm just an architect! :-) Greg Frazier &&&&&&&&&&&&&&&&________________________@@@@@@@@@@@@@@@@@@@@ Greg Frazier o Internet: frazier@CS.UCLA.EDU CS dept., UCLA /\ UUCP: ...!{ucbvax,rutgers}!ucla-cs!frazier ----^/---- /
lgy@blake.acs.washington.edu (Laurence Yaffe) (08/23/89)
In article <26642@shemp.CS.UCLA.EDU> frazier@cs.ucla.edu (Greg Frazier) writes: >In article <1989Aug22.163100.25540@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >=>In article <3332@blake.acs.washington.edu> lgy@newton.phys.washington.edu (Laurence Yaffe) writes: >=>> [ comments about excessive inter-process memory contention ] >=> >= [ comments about avoiding thw whole situation ] > >The original posting was asking if there was an OS smart enough >to run the jobs in sequence for him. That's the point - most >machines experience a wide range of workloads, and cannot be >customized for each one of them. Any heavily-used scientific >machine is going to experience the described situation at some >point in its career. Now, obviously, if you have somebody Absolutely correct. >monitoring the machine's performance, he can "manually" prevent >the two jobs from running simultaneously. What would be more >desireable, however, is for the OS to realize what is going on, >and for _it_ to cause the jobs to run sequentially. I suspect >that this would require too much intelligence on the part of the >OS, but then, what do I know - I'm just an architect! :-) > >Greg Frazier I'd like to emphasize that it should be rather easy for an OS to recognize when excessive page faulting is causing the system to thrash. And once that happens, there's plenty of unused cpu cycles which could be used by the OS in order to decide what to do about it! So adding some intelligence to the OS to better cope with this type of problem need not have significant negative impact on ordinary interactive behavior. -- Laurence G. Yaffe Internet: lgy@newton.phys.washington.edu University of Washington Bitnet: yaffe@uwaphast.bitnet
khb@road.Sun.COM (Keith Bierman - Advanced Languages - Floating Point Group ) (08/23/89)
In article <26642@shemp.CS.UCLA.EDU> frazier@cs.ucla.edu (Greg Frazier) writes: ...deleted stuff >machines experience a wide range of workloads, and cannot be >customized for each one of them. Any heavily-used scientific >machine is going to experience the described situation at some >point in its career. Now, obviously, if you have somebody >monitoring the machine's performance, he can "manually" prevent >the two jobs from running simultaneously. What would be more >desireable, however, is for the OS to realize what is going on, >and for _it_ to cause the jobs to run sequentially. I suspect >that this would require too much intelligence on the part of the >OS, but then, what do I know - I'm just an architect! :-) These are reasons why some folks actually like batch OS ... in the old days we had ways of informing the system of our expected needs (cpu, io, "cards", etc.) and this was used by some fairly clever scheduling algorithms. As Unix machines grow up, I personally expect something fancier than nice level to allow the savvy programmer to inform the OS of expected needs (for these scientific codes, we should be able to assert, long running, not very interactive .. this would enable long timeslices but at low priority ... and perhaps only one mongo job at a time). There are some "batchqueue" type products which third parties and national labs have cobbled up. Keith H. Bierman |*My thoughts are my own. !! kbierman@sun.com It's Not My Fault | MTS --Only my work belongs to Sun* I Voted for Bill & | Advanced Languages/Floating Point Group Opus | "When the going gets Weird .. the Weird turn PRO"
davecb@yunexus.UUCP (David Collier-Brown) (08/23/89)
[In a discussion of excessive inter-process memory contention ] lgy@blake.acs.washington.edu (Laurence Yaffe) writes: | I'd like to emphasize that it should be rather easy for an OS to recognize | when excessive page faulting is causing the system to thrash. | And once that happens, there's plenty of unused cpu cycles which could | be used by the OS in order to decide what to do about it! So adding | some intelligence to the OS to better cope with this type of problem | need not have significant negative impact on ordinary interactive behavior. The execution of this policy should be fairly cheap: deciding on when to carry it out might be harder. I'll argue that a daemon which detects processes thrashing the system can find all the processes which should be scheduled sequentially, but will have to do a large amount of work ensuring that they're really non-interactive, able to be arbitrarily delayed, etc. Once you do the decision making, the kernel can smash their priorities down "below the flashing lights", or make them the idle process itself. I really wouldn't care to have the policy-maker in a kernel. I think the risc/cisc complexity/cost arguments apply here (and that's the only thing about architecture in this whole posting). For practical purposes, the SysV printer spooling system is really quite a general-purpose creature, and can be turned into a mechanism for simulating a batch queue (source: Drew Sullivan, personal communication). As we have a machine (Mips M2000) with such a spooler sitting there unused, I'll have to see about using it as a batch queue for our scientists' number-crunching jobs. --dave -- David Collier-Brown, | davecb@yunexus, ...!yunexus!davecb or 72 Abitibi Ave., | {toronto area...}lethe!dave Willowdale, Ontario, | Joyce C-B: CANADA. 223-8968 | He's so smart he's dumb.
adam@castle.ed.ac.uk (Adam Hamilton) (08/23/89)
In article <3342@blake.acs.washington.edu> lgy@blake.acs.washington.edu (Laurence Yaffe) writes:
We have an ongoing discussion about scheduling of large jobs.
I suggest that a good batch system would be better at this than trying to
do everything in the kernel. It can monitor the various requirements of
the separate jobs (store utilisation, i/o and system call patterns etc)
and tune accordingly. What it needs from the operating system is this
information plus some more (like interactive usage).
Summary - if it doesn't need to be in the kernel, don't put it there.
henry@utzoo.uucp (Henry Spencer) (08/24/89)
In article <3342@blake.acs.washington.edu> lgy@blake.acs.washington.edu (Laurence Yaffe) writes: >...This response ("buy more memory, or sequence jobs by hand"), while common, >completely misses the point. The real issue here is how to maximize overall >performance with a given physical configuration... Well, but I don't think it *entirely* misses the point. One should be aware that there are circumstances in which *no* amount of fiddling will get reasonable performance out of a given configuration and a given workload, as the two are simply incompatible. > As to your specific question - yes, I have considered how long it takes >to swap out a 25 Mb process. On my system, perhaps 10-20 seconds. This is >utterly negligible for jobs which require many cpu-hours or cpu-days. It would have been nice if you'd specified that these are background compute jobs, not interactive applications; your example (Mathematica) strongly suggested otherwise. Timesharing machines and batch computing servers are very different applications of the hardware. For interactive processes, I stand by my original comment: the system is simply too small. For batch work, the idea is reasonably practical but requires very different system tuning, which Unix (traditionally an interactive system) hasn't given a lot of attention to. Also, have you *timed* that swapping, or is that just maximum disk transfer rate? Remember that your system probably can't drive your disks at full speed; disk transfer rates are "guaranteed not to exceed" rates, not often achievable in practice. Your point is still valid, in that the time is negligible for long-running batch jobs, but a dedicated batch machine is clearly needed, as that kind of disk activity will ruin interactive use. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
cander@unisoft.UUCP (Charles Anderson) (08/24/89)
From article <3332@blake.acs.washington.edu>, by lgy@blake.acs.washington.edu (Laurence Yaffe): > Questions: > Which (if any) high-performance Unix systems are capable of avoiding > this type of inter-process contention? What works best - actual process > swapping layered on top of demand paging, better paging algorithms, > clever scheduling based on adaptive process priorities, ... ? Straight out of school I worked on a proprietary, non-Unix system that handled this problem fairly well (even though the company went out of business). In addition the usual queues for runable procs, I/O waiting procs, etc, the operating system had a "trouble maker" queue. This was for processes that were making too many demands on the system, i.e. encouraging thrashing. Processes on the TM queue were stripped of their resources (essentially swapped out, I think). After a while they were allowed to return and were given some extra grace because the first thing they're going to do is page-fault in their working set which would otherwise land them right back in the pokey. In the example that Mr. Yaffe gives, there would be two processes, A & B, competing with one another and thus driving the system towards thrashing. The system sees this, selects A as the trouble maker and puts him in the penalty box. B runs fine without A getting in his way. After a while A is allowed to return and he starts slamming against B. A is graced, having just returned from the TM queue, so B gets selected for the TM queue and A runs well until B returns when the sequence repeats again. I don't know how well this would work on a real Unix system, but it seemed to work well for this particular system. -- Charles. {sun, amdahl, ucbvax, pyramid, uunet}!unisoft!cander
ingoldsb@ctycal.COM (Terry Ingoldsby) (08/24/89)
In article <3342@blake.acs.washington.edu>, lgy@blake.acs.washington.edu (Laurence Yaffe) writes: > In article <1989Aug22.163100.25540@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: > >In article <3332@blake.acs.washington.edu> lgy@newton.phys.washington.edu (Laurence Yaffe) writes: > >>suppose you have a 32 Mbyte system and that there are only two processes > >>running, each of which uses 25 Mbytes of virtual memory (almost all data space) > >>with rapid, random access patterns... > >> This sort of contention could obviously be prevented by a smarter > >>scheduling technique - something which would entirely swap out one > >>process for a period of time in order to let the other process run. > > > >Have you ever figured out how long it takes to swap out a 25MB process? > >Or to swap it back in again? The fact is, this example system simply > >does not have enough main memory for its workload. The fix is, either > >double the memory or run the jobs in sequence rather than simultaneously. > > > >V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology > >1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu > > This response ("buy more memory, or sequence jobs by hand"), while common, > completely misses the point. The real issue here is how to maximize overall But surely this is a discussion of the classis tradeoff of response time vs throughput. It is quite common that tuning a system for maximum performance decreases overall response time to the point that users threaten to lynch the system manager. Note that most supercomputers, whose raison d'etre is to go real fast, don't timeshare at all. Finally, I don't understand why your performance is so poor, but maybe there is something I don't understand about UNIX scheduling, or else your task is *so* degenerate that no memory management/scheduling scheme will work. In normal circumstances if a process tries to access a non-resident page, the OS blocks that process until the I/O completes and lets the other job(s) that do have the necessary info in memory continue to run. If you mean that your jobs are so degenerate that each process does only a handful of memory accesses before page faulting, then obviously you will always be waiting for the page to be swapped in. If your accesses are truly random, then *no* memory scheme will ever be able to give you a good hit rate. If they are not truly random, then organize your data such that elements are stored in such a way that chronologically accessed data reside on the same page. In summary, either reorganize your data, or do what Henry says. Although interesting, this discussion is probably moving away from the charter or comp.arch. -- Terry Ingoldsby ctycal!ingoldsb@calgary.UUCP Land Information Systems or The City of Calgary ...{alberta,ubc-cs,utai}!calgary!ctycal!ingoldsb
lgy@blake.acs.washington.edu (Laurence Yaffe) (08/24/89)
In article <2394@unisoft.UUCP> cander@unisoft.UUCP (Charles Anderson) writes:
-From article <3332@blake.acs.washington.edu>, by lgy@newton.phys.washington.edu (Laurence Yaffe):
-> Questions:
-> Which (if any) high-performance Unix systems are capable of avoiding
-> this type of inter-process contention? What works best - actual process
-> swapping layered on top of demand paging, better paging algorithms,
-> clever scheduling based on adaptive process priorities, ... ?
-
-Straight out of school I worked on a proprietary, non-Unix system that
-handled this problem fairly well (even though the company went out of
-business). In addition the usual queues for runable procs, I/O waiting
-procs, etc, the operating system had a "trouble maker" queue. This was
-for processes that were making too many demands on the system, i.e.
-encouraging thrashing. Processes on the TM queue were stripped of
-their resources (essentially swapped out, I think). After a while they
-were allowed to return and were given some extra grace because the
-first thing they're going to do is page-fault in their working set
-which would otherwise land them right back in the pokey.
-
-In the example that Mr. Yaffe gives, there would be two processes, A &
-B, competing with one another and thus driving the system towards
-thrashing. The system sees this, selects A as the trouble maker and
-puts him in the penalty box. B runs fine without A getting in his
-way. After a while A is allowed to return and he starts slamming
-against B. A is graced, having just returned from the TM queue, so B
-gets selected for the TM queue and A runs well until B returns when the
-sequence repeats again. I don't know how well this would work on a real
-Unix system, but it seemed to work well for this particular system.
-
---
-Charles.
Of the various responses to my initial query, this note by Charles
Anderson comes closest to the sort of approach I've been wishing for.
It has the virtues of:
a) being dynamic - responding to what a process actually does, not what
some user said it might do. This is clearly desirable. I take it
for granted that static, rigidly enforced job queues ala IBM are a
prime case of the "cure" being as bad as the "disease".
("I'm sorry, your job ran for 2 weeks before exceeding the page fault
limit you initially guessed. You lose. Try running it again with a
larger limit...")
b) approximating what a human system manager typically does:
notice that the system is starting to thrash,
look at the active processes and see which ones
are generating high page fault rates,
find out the effective "working set" size of these processes
and figure out which processes can effectively run concurrently,
try putting one or more of the processes "on hold"
in order to allow the remaining processes to run effectively,
periodically reevaluate the situation and readjust which (if any)
processes should be put "on hold".
Whether "on hold" means explicitly swapped out, or suspended, or
simply restricted to a small, but non-zero number of physical memory
pages is an implementation issue (about which I was hoping to hear
some interesting discussion). I do know that simply manipulating
"nice" values is not adequate (at least on the systems I use) for
preventing memory contention. I don't know the "best" way to
implement this sort of approach - I'm a full time user and part
time system manager, not an OS guru.
The important point is the need for some sort of intelligently selected,
dynamic response which prevents system resources from being allocated
excessively "democratically" at times when "equal shares for all"
turns into "everyone starves".
--
Laurence G. Yaffe Internet: lgy@newton.phys.washington.edu
University of Washington Bitnet: yaffe@uwaphast.bitnet
bin@primate.wisc.edu (Brain in Neutral) (08/24/89)
This thread now really has little to do with MIPS systems in particular. Please, edit the newsgroups lines to say comp.arch, not comp.arch,comp.sys.mips. Thanks. Paul DuBois dubois@primate.wisc.edu
rwc%ilmarin.c3@lanl.gov (Robert W. Cox) (08/24/89)
From article <440@ctycal.UUCP>, by ingoldsb@ctycal.COM (Terry Ingoldsby): > Note that most supercomputers, whose raison d'etre is to go > real fast, don't timeshare at all. The majority of supercomputers are Crays. The most common OS's on Crays are CTSS and UNICOS, both of which are timesharing. Bob Cox
ram@wb1.cs.cmu.edu (Rob MacLachlan) (08/24/89)
Being a Lisp programmer, I have also observed that Unix (and many other O.S.'s also) tends to lose when running large programs that don't fit a "typical" VM behavior pattern. One problem is that many O.S.'s use a purely global page replacement algorithm: no consideration is given to what process that page belongs to, and to what the history of that process is. Many classic VM algorithms (such as Working Set) do consider processes and degree of thrashing, etc., but these algorithms seem to be little used in recently designed o.s.'es. [Since this is comp.arch, I will note in passing that is may be partly due to brain-dead VM hardware, such as the VAX, which doesn't even have a reference bit.] This is probably because people discovered that with lower degrees of time-sharing and larger memories, global strategies worked just as well for the "typical" workload. As to how an O.S. could address these problems, I strongly agree that a dynamic strategy is preferable to a batch-oriented one. A Lisp process generally persists for an entire interactive session, and may exhibit a wide range of different behaviors during that period. The O.S. must not only recognise when the process is antisocial, it must also recognize when it becomes social again. As to exactly what to do when you discover that memory is over-committed: > Whether "on hold" means explicitly swapped out, or suspended, or > simply restricted to a small, but non-zero number of physical memory > pages is an implementation issue (about which I was hoping to hear > some interesting discussion). It's hard to know exactly what you mean by "explicitly swapped out" in a paged VM system. Unlike in a true swapping system, you are never going to write out clean pages. The question is only what you are going to do to the processe's resident pages at the time you decide to "swap" it. You can: -- Clean all the dirty pages (perhaps freeing them once cleaned), and -- free all of the clean pages. Scheduling writes of all the clean pages is probably worthwhile; it probably doesn't make any difference whether you free the clean pages or just allow them to be replaced when you page the other process in. "suspending" the process presumably means stopping the process, but doing neither of the above actions on its pages. This would work pretty well, especially if the VM system cleans dirty pages efficiently (many at a time). Restricting a thrashing process to even less memory is a bad idea. Although it will allow other processes to get them memory they need, the restricted process will thrash even more, soaking up disk bandwidth and O.S. runtime that could be used by other processes. There's no point in trying to let the thrashing process run, since it's not going to be able to get anything done without the memory it needs. Rob
aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (08/25/89)
..> L. Yaffe talks about thrashing (2 25M processes on a 32M system), asks: > Which (if any) high-performance Unix systems are capable of avoiding >this type of inter-process contention? What works best - actual process >swapping layered on top of demand paging, better paging algorithms, >clever scheduling based on adaptive process priorities, ... ? Gould NPL had a "working set" scheduler for memory; a working set was determined by looking at the number of pages touched by a process in a given interval of process virtual time (not system time); a process would not run if the working set could not be loaded. Pervase Akhtar wrote a paper on it for a past UNIX conference. Working set schedulers are quite common in large (non-UNIX) systems. See any recent issue of ACM SIGMETRICS' Conference Proceedings. -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
ttwang@polyslo.CalPoly.EDU (Thomas Wang) (08/25/89)
lgy@newton.phys.washington.edu (Laurence Yaffe) writes: > A lot of current machines, such as my MIPS box, have pretty standard >virtual memory systems: demand paging is used to allocate real memory to >individual processes, with some sort of least-recently-used algorithm >employed to select pages to write out. >Severe, and unnecessary, performance degradation can occur if you have several >processes which have very poor locality of reference when accessing their >data, are sufficiently small that any singly process can fit in real memory, >and yet sufficiently large that no two processes will fit in real memory. To avoid excessive page fault rate, a good algorithm is to load in multiple pages depending on system load. Actually how many pages should you load in is still a research question. In HP's term, loading in extra pages is called 'pre-fetching'. HP's MPE/XL have pre-fetching. I don't know about Unix, but my guess is that Unix probably does not. -Thomas Wang ("I am, therefore I am." - Akira ) ttwang@polyslo.calpoly.edu
aglew@urbana.mcd.mot.com (Andy-Krazy-Glew) (08/25/89)
>Have you ever figured out how long it takes to swap out a 25MB process? >Or to swap it back in again? The fact is, this example system simply >does not have enough main memory for its workload. The fix is, either >double the memory or run the jobs in sequence rather than simultaneously. Isn't that what the scheduler is supposed to figure out by itself? -- Andy "Krazy" Glew, Motorola MCD, aglew@urbana.mcd.mot.com 1101 E. University, Urbana, IL 61801, USA. {uunet!,}uiucuxc!udc!aglew My opinions are my own; I indicate my company only so that the reader may account for any possible bias I may have towards our products.
hammondr@sunroof.crd.ge.com (richard a hammond) (08/25/89)
lgy@newton.phys.washington.edu (Laurence Yaffe) writes: > >>... >>Severe, and unnecessary, performance degradation can occur if you have several >>processes which have very poor locality of reference when accessing their >>data, are sufficiently small that any singly process can fit in real memory, >>and yet sufficiently large that no two processes will fit in real memory. In article <13773@polyslo.CalPoly.EDU> ttwang@polyslo.CalPoly.EDU (Thomas Wang) comments: >To avoid excessive page fault rate, a good algorithm is to load in multiple >pages depending on system load. Actually how many pages should you load in >is still a research question. In HP's term, loading in extra pages is >called 'pre-fetching'. > >HP's MPE/XL have pre-fetching. I don't know about Unix, but my guess is >that Unix probably does not. Note that in the case postulated by L. Yaffe, pre-fetching makes the thrashing more severe, because the memory accesses have "poor locality of reference", i.e. the working set is very close to the total memory required by the process. Fetching pages n through n+m because of a page fault on n would increase the I/O overhead without increasing the probability that you would avoid the next page fault. Historical commentary: The early multi-user UNIX machines, DEC PDP 11/45 and 11/70, could support more physical memory than the largest process could use, so the scheduler didn't worry about this sort of thrashing. VM works fine if you have good locality of reference, while it fails miserably if you don't. With poor locality of reference, you need to have enough physical memory to keep the job in memory. Practical commentary: Yes, the problem is interesting and can be solved, but I never observed it except on our large number crunching machines, i.e. the Convex and Alliant. On those machines I scheduled the jobs by hand, after all, if the job is going to take 10 days of CPU time, you can just adjust the jobs so one runs over night and the other one runs during the day. So, I conclude that the problem isn't that interesting for the vast majority of UNIX systems, i.e. all our Suns and Vaxen didn't have the problem. Rich Hammond
slackey@bbn.com (Stan Lackey) (08/25/89)
In article <5967@pt.cs.cmu.edu> ram@wb1.cs.cmu.edu (Rob MacLachlan) writes: >Many classic VM algorithms (such as Working Set) do consider processes and >degree of thrashing, etc., but these algorithms seem to be little used in >recently designed o.s.'es. [Since this is comp.arch, I will note in passing >that is may be partly due to brain-dead VM hardware, such as the VAX, which >doesn't even have a reference bit.] Instead of examining a reference bit, an OS could check to see if the page table entry is in the translation cache; this can be better than a simple reference bit, which only tells you if a page has been read. If a translation is in the cache, it indicates that the page is likely being read often. -Stan
leichter@CS.YALE.EDU (Jerry Leichter) (08/26/89)
In article <5967@pt.cs.cmu.edu>, ram@wb1.cs.cmu.edu (Rob MacLachlan) writes... >...Many classic VM algorithms (such as Working Set) do consider processes and >degree of thrashing, etc., but these algorithms seem to be little used in >recently designed o.s.'es. [Since this is comp.arch, I will note in passing >that is may be partly due to brain-dead VM hardware, such as the VAX, which >doesn't even have a reference bit.]... VMS and the VAX architecture were designed at the same time by a group of people who worked together. VMS has implemented a Working Set algorithm since it first shipped. I've heard (from reasonably reliable sources) that the hardware people were ready to include a reference bit, but the software people's work showed they didn't need it, given appropriate algorithms. The fact that early Unix implementations for the VAX didn't handle it correctly is a problem with the developers of those early versions, not with the VAX architecture. I have never understood this need people have to use the VAX as the standard bad example of EVERYTHING. In fact, the designers of the VAX worked in ways very similar to the way good RISC designers work today: Hardware and software people cooperated, and there was an attempt to balance the complexity each had to deal with. It may seem hard to believe, almost 15 years later, that the compiler writers heavily influenced the design of the ISA, asking for things like 3-address arithmetic instructions in addition to 2-address and indexed addressing modes, but given the compiler technology of the day, this is what they thought would be useful to them. Compiler technology has changed greatly since then, as has hardware technology. The main difference between the VAX design team and current design teams is probably in the inclusion of the OS designers. The process structure of the VAX, the support in hardware for AST's, the security structure - all of these are direct reflections of the OS being developed. These days, OS design is rapidly becoming a lost art - Unix is the answer, what was the question? So there is little room for interplay any more between OS designers and hardware designers - the hardware designers provide what the well-understood Unix kernel needs and are finished. (To a certain extent, language design is approaching the same point: Support C and figure you are done. The VAX was viewed from the start as a multi-lingual machine, and its ISA contains elements that the COBOL developers found important, as well as elements the FORTRAN developers wanted. Most of the early RISC papers used C examples, exclusively. More recently, we've seen "Smalltalk RISC's" and "LISP RISC's". I'm certainly not up on all the literature, but I know of no RISC design, paper or otherwise, specifically intended to simultaneously support two quite different languages efficiently. As a language designer, I find it very distressing to think that the machines of the future may be "refined" to the point where doing anything but Unix and C will require extreme effort. Take a look at the history of LISP on CDC machines to see the possible results.) -- Jerry
henry@utzoo.uucp (Henry Spencer) (08/27/89)
In article <70663@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry Leichter) writes: >I've heard (from reasonably reliable sources) that the hardware people were >ready to include a reference bit, but the software people's work showed they >didn't need it, given appropriate algorithms... That *they* didn't need it, note, not that other software developers might not need it. (Of course, as we all know, there was only going to be *one* operating system for the VAX...) >The main difference between the VAX design team and current design teams is >probably in the inclusion of the OS designers. The process structure of the >VAX, the support in hardware for AST's, the security structure - all of these >are direct reflections of the OS being developed... The same is true of the architectural tweaks on many modern machines. For example, according to John Mashey, the MIPS machines make a considerable effort to do precise exceptions because the OS people got quite upset about the alternative. The difference is that modern OS people don't see any need to reinvent the square wheel, so the issue is how to *support* the system, not what it should look like. >...To a certain extent, language design is >approaching the same point: Support C and figure you are done. The VAX was >viewed from the start as a multi-lingual machine, and its ISA contains >elements that the COBOL developers found important, as well as elements the >FORTRAN developers wanted. Most of the early RISC papers used C examples, >exclusively. More recently, we've seen "Smalltalk RISC's" and "LISP RISC's". >I'm certainly not up on all the literature, but I know of no RISC design, >paper or otherwise, specifically intended to simultaneously support two quite >different languages efficiently... Might one ask how many RISC designs you are acquainted with? The SPARC, the MIPS architecture, and the HP Precision Architecture, to name three, were all designed with considerable attention to multi-language support. The fact is, supporting C well is very nearly enough to support the other "conventional" languages well. (Note, "very nearly" -- some attention to the fine points is still desirable.) Compiler designers today understand that it may be better to convert COBOL numbers to binary for arithmetic than to mess up the hardware with decimal instructions, for example. COBOL programs are seldom arithmetic-bound anyway. >distressing to think that the machines of the future may be "refined" to the >point where doing anything but Unix and C will require extreme effort. Take a >look at the history of LISP on CDC machines to see the possible results.) Take a look at the history of Lisp on Lisp machines, whose time has come *and gone* -- those awful "C-only" RISC machines run Lisp faster than the custom-cooked Lisp machines do. -- V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology 1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
plogan@mentor.com (Patrick Logan) (08/30/89)
In article <1989Aug26.232710.27174@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >In article <70663@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry Leichter) writes: >>distressing to think that the machines of the future may be "refined" to the >>point where doing anything but Unix and C will require extreme effort. Take a >>look at the history of LISP on CDC machines to see the possible results.) > >Take a look at the history of Lisp on Lisp machines, whose time has come >*and gone* -- those awful "C-only" RISC machines run Lisp faster than the >custom-cooked Lisp machines do. >-- >V7 /bin/mail source: 554 lines.| Henry Spencer at U of Toronto Zoology >1989 X.400 specs: 2200+ pages. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu Saying that C machines run Lisp faster than Lisp machines is simplistic. To run Lisp faster on a C machine requires more declarations in the code (or, as with T, more use of "type-specific" procedures such as fx+ instead of generic +). This goes against the grain of Lisp programming. Lisp machines provide a better environment (e.g. protection, ability to write fast, generic code) per performance unit. I'd guess the only reasons their time is gone are market based. They're sure to return in one form or another as more and more C machine programs are written in Lisp-like languages. Many C machine architectures (e.g. SPARC) already incorporate some support for Lisp. -- Patrick Logan | ...!{decwrl,sequent,tessi}!mntgfx!plogan Mentor Graphics Corporation | plogan@pdx.MENTOR.COM Beaverton, Oregon | "My other computer is a Lisp machine."
mash@mips.COM (John Mashey) (08/31/89)
In article <1989Aug30.152155.9613@mentor.com> plogan@mentor.com (Patrick Logan) writes: >In article <1989Aug26.232710.27174@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >>Take a look at the history of Lisp on Lisp machines, whose time has come >>*and gone* -- those awful "C-only" RISC machines run Lisp faster than the >>custom-cooked Lisp machines do. >Saying that C machines run Lisp faster than Lisp machines is >simplistic. To run Lisp faster on a C machine requires more >declarations in the code (or, as with T, more use of "type-specific" >procedures such as fx+ instead of generic +). This goes against the >grain of Lisp programming. > >Lisp machines provide a better environment (e.g. protection, ability >to write fast, generic code) per performance unit. I'd guess the only >reasons their time is gone are market based. They're sure to return in >one form or another as more and more C machine programs are written in >Lisp-like languages. Many C machine architectures (e.g. SPARC) already >incorporate some support for Lisp. There are many good ideas in LISP, and more of it will filter over. The issue (and it has been discussed before in comp.arch), is that the economics of the computer business end up with the following: a) If something is a low-volume, special-purpose architecture, it must be MUCH better at what it does than the high-volume, cheap competition. If it isn't, it will end up going away, because the competition gets to keep tracking the technology curve at speed, rather than getting behind it. also, if something has high-volume in VLSI, it gets to be cheap. b) The current RISC chips are getting fast and cheap enough that this is happening. c) There is still some fine-tuning to be done, but most of them are more like tweaks added to some of the existing architectures (tagged adds, for example, by themselves, do not necesarily solve the problem: consider the exception- handling sequences also needed). Useful datapoints might be, now and in future: a) Symbolics, Inc. history, where it seems like they're geting out of hardware-design business in favor of (fine) software. b) TI: Ivory chip versus SPARCs CONJECTURE/PREDICTION: fast VLSI RISCs will end up capturing the high end of the LISP market over the next few years. -- -john mashey DISCLAIMER: <generic disclaimer, I speak for me only, etc> UUCP: {ames,decwrl,prls,pyramid}!mips!mash OR mash@mips.com DDD: 408-991-0253 or 408-720-1700, x253 USPS: MIPS Computer Systems, 930 E. Arques, Sunnyvale, CA 94086
baum@Apple.COM (Allen J. Baum) (09/12/89)
[] >In article <70663@yale-celray.yale.UUCP> leichter@CS.YALE.EDU (Jerry Leichter) writes: >I have never understood this need people have to use the VAX as the standard >bad example of EVERYTHING. In fact, the designers of the VAX worked in ways >very similar to the way good RISC designers work today: Hardware and software >people cooperated, and there was an attempt to balance the complexity each >had to deal with. ..... >the compiler writers heavily influenced the design of the ISA, asking for >things like 3-address arithmetic instructions in addition to 2-address and >indexed addressing modes, but given the compiler technology of the day, this >is what they thought would be useful to them. Compiler technology has changed >greatly since then, as has hardware technology. > >The main difference between the VAX design team and current design teams is >probably in the inclusion of the OS designers. The process structure of the >VAX, the support in hardware for AST's, the security structure - all of these >are direct reflections of the OS being developed. Hear, hear! I'm not particularly a fan of VAX, but its getting a lot of bad press from the RISC people because of its performance vs. the new chips. This is NOT because it is a bad design, its because its an old design, and the nature of the tradeoffs have changed. This includes not just compiler, and OS technology, but lots of the hardware technology: memories, I/O, logic, and packaging. These make a big difference on how a system is designed. The major fault of VAX proponents is that they haven't recognized that these tradeoffs have made a difference, and they still think they can compete (on the basis of performance, as opposed to just running old code). -- baum@apple.com (408)974-3385 {decwrl,hplabs}!amdahl!apple!baum