tbray@watsol.waterloo.edu (Tim Bray) (01/16/90)
gm@keysec.kse.com (Greg McGary) writes: >Our MIPS M/120 has been spending too much time page-thrashing lately. >I would like to have a utility that tells me about the paging behavior... Good luck, you'll need it. Here at the New OED project we have got seriously cheesed off about VM implementations on many unix systems. No matter how much you have, memory remains a critical resource. But even well-regarded Unixes don't give you the tools to manage it. In many applications (e.g. database indexing) performance of algorithms can be greatly improved by knowing how much physical memory you can use, and tuning to use it efficiently. But not on Unix. Some case studies: 1. On a 32-Mb (4.3bsd) machine with *nothing* else happening, the OS stupidly pages away at you if you try to use more than about 20 Mb in the inane belief that the memory will be needed any moment for one of those gettys or nfsds or something that aren't doing anything. 2. A process using only a moderate amount of memory (you think) runs like a dog, and you note that the system is spending much of its time in system state or idle. Why, you wonder. It quickly becomes apparent that the information produced by items such as ps, vmstat, vsar, top, and so on, is comparable in relevance and accuracy to Robert Ludlum novels or peyote visions. (SunOS the villain here). 3. On a 64-Mb (MIPS) machine, your paging rate, system time, and idle time all go through the roof if your process insolently tries to random-access more than 32 Mb of memory at once. Look, we all appreciate the tender loving care that VM architects have put into strategies that are friendly to 100+ moderate-size processes context switching rapidly in time-sharing mode. But there are other ways to use computers, and they are currently very poorly supported. We paid for that memory, we have a good use for it, and the OS is getting in our way, and it's also REFUSING TO TELL US ACCURATELY WHAT'S GOING ON - an unforgiveable sin by my Unix dogma. Harrumph, Tim Bray, New OED Project, U of Waterloo
Ed@alderaan.scrc.symbolics.com (Ed Schwalenberg) (01/17/90)
From: Tim Bray <tbray@watsol.waterloo.edu> Date: 16 Jan 90 03:28:19 GMT Good luck, you'll need it. Here at the New OED project we have got seriously cheesed off about VM implementations on many unix systems. No matter how much you have, memory remains a critical resource. And if you don't have enough, you lose just as badly. Under System V Unix for the 386, when your large process exceeds the amount of non-wired physical memory, the paging algorithm pages out the ENTIRE process (which takes a LONG time), then lets your poor process fault itself in again, oh so painfully, until you exceed physmem again and start the cycle over.
lm@snafu.Sun.COM (Larry McVoy) (01/17/90)
In article <19821@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: >1. On a 32-Mb (4.3bsd) machine with *nothing* else happening, the OS stupidly > pages away at you if you try to use more than about 20 Mb in the inane > belief that the memory will be needed any moment for one of those gettys or > nfsds or something that aren't doing anything. Not a great alg but not terrible if you are running a time sharing system. Take your 32 meg, chop of the ~2 meg for the kernel, chop off the ~4 meg for the buffer cache, and you have about 26 meg left. Now there is still something fishy here - if I've got the numbers right, it does seem odd that the pager is beating you up with 6 megs free. I don't believe this for a second. Try this $ adb /vmunix /dev/kmem lotsfree/D freemem/D ^D The pager does not turn on until freemem < lotsfree (and lotsfree on Sun's is typically small, like 256K or so). So something is wacko. >2. A process using only a moderate amount of memory (you think) runs like > a dog, and you note that the system is spending much of its time in > system state or idle. Why, you wonder. It quickly becomes apparent > that the information produced by items such as ps, vmstat, vsar, top, > and so on, is comparable in relevance and accuracy to Robert Ludlum novels > or peyote visions. (SunOS the villain here). Yeah, well, um, yeah. Right. Well, it's like this see... Actually, the real problem is sharing. Who do you charge shared libraries to? The numbers displayed by all those programs don't take that into account, but they should give you a general idea. Oh, yeah, I assume 4.0 or greater, things were easy before then. >3. On a 64-Mb (MIPS) machine, your paging rate, system time, and idle time > all go through the roof if your process insolently tries to random-access > more than 32 Mb of memory at once. Waddya expect? :-) :-) >Look, we all appreciate the tender loving care that VM architects have put >into strategies that are friendly to 100+ moderate-size processes context >switching rapidly in time-sharing mode. But there are other ways to use >computers, and they are currently very poorly supported. We paid for that >memory, we have a good use for it, and the OS is getting in our way, and it's >also REFUSING TO TELL US ACCURATELY WHAT'S GOING ON - an unforgiveable sin by >my Unix dogma. Hmm. The SunOS VM model was designed with exactly this in mind. You can use damn near 100% of physical mem on a 4.0 or greater rev of the OS (the os uses some, but on a 32 meg machine you should be looking at close to 30 megs of user usable ram). At any rate, qwitchyerbitchin and tell me what you want to have happen. Don't forget that your solution has to work well when I'm time sharing, when one process wants the whole machine, and when two processes want the whole machine. And if you get it right, I'll get it into SunOS or die trying. Looking forward to your reply, --- What I say is my opinion. I am not paid to speak for Sun, I'm paid to hack. Besides, I frequently read news when I'm drjhgunghc, err, um, drunk. Larry McVoy, Sun Microsystems (415) 336-7627 ...!sun!lm or lm@sun.com
dwc@cbnewsh.ATT.COM (Malaclypse the Elder) (01/18/90)
In article <130347@sun.Eng.Sun.COM>, lm@snafu.Sun.COM (Larry McVoy) writes: > >Look, we all appreciate the tender loving care that VM architects have put > >into strategies that are friendly to 100+ moderate-size processes context > >switching rapidly in time-sharing mode. But there are other ways to use > >computers, and they are currently very poorly supported. We paid for that > >memory, we have a good use for it, and the OS is getting in our way, and it's > >also REFUSING TO TELL US ACCURATELY WHAT'S GOING ON - an unforgiveable sin by > >my Unix dogma. > > Hmm. The SunOS VM model was designed with exactly this in mind. You can > use damn near 100% of physical mem on a 4.0 or greater rev of the OS (the > os uses some, but on a 32 meg machine you should be looking at close to > 30 megs of user usable ram). > actually, the vm model address using all of physical memory because it has integrated the paging pool with the buffer pool. but it really hasn't done much for such things as page stealing. in fact, on the version that was ported into system v release 4, i believe it still uses the two hand clock algorithm which goes through physical memory regardless of what that page is being used for. my studies have shown that you really want to classify pages according to "type" even with reference information. i worked with some developers on prototyping some improvements in the old regions architecture (system v release 3) and maybe will get around to integrating it into the vm model. danny chen att!hocus!dwc
dwc@cbnewsh.ATT.COM (Malaclypse the Elder) (01/30/90)
In article <1424@eutrc3.urc.tue.nl>, wsinpdb@eutws1.win.tue.nl (Paul de Bra) writes: > In article <22105@adm.BRL.MIL> Ed@alderaan.scrc.symbolics.com (Ed Schwalenberg) writes: > >... > >And if you don't have enough, you lose just as badly. Under System V > >Unix for the 386, when your large process exceeds the amount of > >non-wired physical memory, the paging algorithm pages out the ENTIRE > >process (which takes a LONG time), then lets your poor process fault > >itself in again, oh so painfully, until you exceed physmem again and > >start the cycle over. > > This most certainly is not true. > I have experimented with growing processes and what really happens is > that when the total size of all processes approaches physical memory > size the pager starts to page out some old pages. I can have a process > grow slowly and never really be paged (or swapped) out completely. > (I have tried this with a 20Mbyte process on a machine with 8Mbyte of > memory). > > However, if a process is using the standard malloc() routine to allocate > memory, then in order to allocate more memory malloc will search through > a linked list of pointers, which are scattered throughout your process' > memory. This usually involves a lot of paging, and it indeed is possible > that all pages of a process are paged out, while other pages (of the > same process) are being paged in. I have observed this behaviour with > a process that exceeded physical memory only by a small margin. > The solution is to use the routines in libmalloc, which do not use the > scattered linked list of pointers. Switching to libmalloc completely > stopped the thrashing. > > The malloc() routine in BSD does not use the linked list approach either, > so a growing process does not cause this kind of thrashing in BSD. > i'm not sure about the user side of things (e.g. malloc) but i think what the original poster was referring to was the fact that in system v release 3, if a page fault could not find any free physical memory, the faulting process would roadblock and put the process in SXBRK state. the memory scheduler, sched, would then be awaken to swap out a process(es). note that when this happens, it is VERY LIKELY for other processes to also roadblock on memory in the same state. i use the "keystone cops" as the visualization aid for this effect. this is the reason by SVR3 could go into idle state on a busy system (if you examine sar output for a memory overloaded system). but i digress. i don't remember what the final method was for handling the case of a single process on the run queue faulting and using more than physical memory but one iteration of it had the memory scheduler do nothing in that situation. it would then be up to the paging daemon to steal pages from that process (page aging was done according to wall clock time). note that the swapping was not done in a single i/o operation but was ultimately broken up by at least the device driver into track-size pieces. but that doesn't address the problem of latency. the process was subject to latency on swapping and would have to painfully page its 'working set' back in. it would certainly make more sense to swap out in convenient size pieces, leaving the process ineligible to run (don't want a process that is being swapped out continuing to contend for pages) until either the memory shortage cleared up or the entire process was swapped out. this idea was incorporated into a regions-based prototype designed to handle memory contention, load control, and page replacement in a more sane manner. of course, with SVR4, the regions architecture went out the window and we would have to redesign a prototype based on VM (not an easy task). we may eventually do it though. danny chen att!hocus!dwc