xanthian@zorch.SF-Bay.ORG (Kent Paul Dolan) (01/16/91)
daveh@cbmvax.commodore.com (Dave Haynie) writes: > Actually, the same kind of thing seems to be true of Sun SPARC > machines. Though the SPARCs seems to have kind of a plateau effect -- > they drop off linearly for CPU hog tasks 1..N, then all of a sudden > take a nose dive. I don't know if this is a Sun 4 implementation > detail, or an expected effect of the SPARC architecture, though. Nope, that is a well know and easily explained characteristic of _any_ virtual memory system, usually called the "working set" phenomenon. It is a characteristic of any well written, modular software that it can execute along quite happily with a small subset of its total executable's virtual pages in real memory, because execution is focused for a "substantial" period of time on a limited number of virtual memory pages called the "working set". It is characteristic that the task executes some instructions out of _each_ page of its working set in every normal preemptive time slice. So, as long as (number of cpu intensive tasks) * (average working set of pages) < (total available real memory for virtual pages) page faults will be relatively rare and cpu bound jobs will mull happily away, execution traces limited to pages already in memory, for "substantial" (many time slices) periods of time, getting a substantial portion of 1/Nth of their stand-alone performance out of the cpu. When a process does page fault, there is a page not currently in demand that can be swapped out, and the other jobs use the extra time to their benefit. But, let the needed virtual page working sets exceed the total real memory available to hold them, and merry hell breaks loose. Since there aren't enough real memory pages to hold all the working sets, some job will do a page fault, and the page that gets swapped out is going to be one that another job is going to need _immediately_ when its time slice comes around in turn, so it will page fault, causing a page to be swapped out that another job is going to need in its next time slice, which causes a page fault, ... ad infinitum. The process is called thrashing, physically it means the read heads are moving so fast across the swap area your computer is trying to walk off your desk, and _no_ job gets any work done, since every job that comes up to finally execute in its newly swapped in page promptly hits another page in the working set that has been swapped out to satisfy some other job, so it page faults and goes back to sleep. The net result is that all N+1 jobs are sleeping waiting for page fault i/o almost all the time, the swap area i/o is maxed out, and performance drops into a black hole. It's really instructive to work through the numbers on this one, but I don't have the input data available, so you'll have to live with the qualitative description above. And, this having become a tutorial, a copy goes to .introduction. Snaffle it, Ferry, for the FAQs, please. Followups back to .advocacy. Kent, the man from xanth. <xanthian@Zorch.SF-Bay.ORG> <xanthian@well.sf.ca.us>