[comp.sys.sgi] Swap questions

shoshana@koko.UUCP (Shoshana Abrass) (11/15/90)

  I've been playing around with osview, trying to determine the optimal
  swap space for a PI with 48 to 64 Mbytes of memory. I noticed a few
  oddities, and I'm hoping someone at SGI can explain them.

  First, the reason we have so much memory is that we manipulate large
  image files (1-16 Mb), several at a time. But even on a machine that's
  manipulating 16 Mb files, swap activity seems very low. "swap -l"
  claims that only 5 Mb of swap are being used. "osview -i1" shows lots
  of page faults, but hardly any swapping activity. So my questions are:

  1. Are the image files ever paged out to the swap area? it seems not.
	 It seems like the executable code is the only thing getting paged
	 to virtual memory.

  2. Are the image files 'swapin'ed (according to osview) when they're 
	 first read into memory? What's the difference between the "Virtual
	 Memory" and "Swap" in osview?

  In a kernel class I took, the instructor claimed that whenever a process
  was read into memory, swap space was reserved/allocated for it, before 
  it ever got paged out. Is this true under Irix? (is it ever true, for 
  that matter). If this is true, it implies that one must have at least
  as much swap as real memory... even though "swap -l" claims that no
  swap is being used. Also, if it is true, is it only true for the 
  executable code, and not the data?

  What's the real scoop here? Any help much appreciated.

  -shoshana
  pdi!shoshana@sgi.com

================== Disclaimer necessitated by mailpath: ==================
              I don't work for sgi, I just work downstream.
==========================================================================

jmb@patton.wpd.sgi.com (Doctor Software) (11/16/90)

In article <9011150232.AA00704@koko.pdi.com>, shoshana@koko.UUCP
(Shoshana Abrass) writes:
> 
>   1. Are the image files ever paged out to the swap area? it seems not.
> 	 It seems like the executable code is the only thing getting paged
> 	 to virtual memory.

This is mostly correct. Only pieces of executable programs get swapped
out, whether they are text or data. It would make little sense to swap a
data file - it's already on the disk anyway!

Files are managed through the system "buffer cache", which is an area of
memory used to cache file pages. In IRIX 3.3 and later, the system
dynamically sizes the buffer cache, and will attempt to keep a balance
between executable code and file data in memory to minimize disk traffic
and maximize performance.

> 
>   2. Are the image files 'swapin'ed (according to osview) when they're 
> 	 first read into memory? What's the difference between the "Virtual
> 	 Memory" and "Swap" in osview?

Image file traffic can be examined in the buffer traffic section of
osview, or the "bdev" bar of gr_osview. Also, the real memory section of
osview or the "rmem" and "rmemc" bars of gr_osview show memory usage,
and what is assigned to the buffer cache versus executable code, etc.
The man page for gr_osview describes what all this stuff means in quite
a bit of detail.

In essence, your file is only read into memory as you access it, not all
at once, so to see the traffic effect look at the block device (buffer
cache) information. Virtual memory is paging activity on program text
and data. Paging activity does not necessarily imply disk activity! Page
faults occur for copy-on-write, modifications, zero-on-fill, double TLB
faults, etc. Swap only occurs when, as the "last resort", the system
must push some pages out to disk to make room for other pages. A swapin
only occurs as a "last resort" if a user accesses a page that only
resides on the disk. Going to the disk is expensive - the system avoids
it whenever possible. Again, the man page for gr_osview has a lot more
information on all this stuff.

>   In a kernel class I took, the instructor claimed that whenever a process
>   was read into memory, swap space was reserved/allocated for it, before 
>   it ever got paged out. Is this true under Irix? (is it ever true, for 
>   that matter). If this is true, it implies that one must have at least
>   as much swap as real memory... even though "swap -l" claims that no
>   swap is being used. Also, if it is true, is it only true for the 
>   executable code, and not the data?

Swap space is allocated (it's cheap - just a counter) whenever a new
page of program memory is created. This means that when a program is
started, the text, initial stack and data segments have backing store
allocated in swap. This avoids deadlock of the system. Sar, osview and
gr_osview only report the number of swap pages that actually have an
active copy of a memory page on them - this is different than just
making sure that pages will be available in case swapping is necessary.

To imagine the deadlock, consider what would happen if I started a
program without making sure that swap space was available. I could
already have filled up my swap space with other program data. Then, the
new program allocates a new page of data, and I run out of memory. I
need to swap something to get memory for the new page, but swap space is
full so I can't swap, so ... deadlock.

> 
>   What's the real scoop here? Any help much appreciated.

I think all your questions have been answered!

>   -shoshana
>   pdi!shoshana@sgi.com

-- Jim Barton
   Silicon Graphics Computer Systems
   jmb@sgi.com

slevy@poincare.geom.umn.edu (Stuart Levy) (11/18/90)

In article <1990Nov16.155730.2670@odin.corp.sgi.com> jmb@patton.wpd.sgi.com (Doctor Software) writes:
>In article <9011150232.AA00704@koko.pdi.com>, shoshana@koko.UUCP
>(Shoshana Abrass) writes:
>> ...In a kernel class I took, the instructor claimed that whenever a process
>>   was read into memory, swap space was reserved/allocated for it, before 
>>   it ever got paged out. Is this true under Irix? (is it ever true, for 
>>   that matter). If this is true, it implies that one must have at least
>>   as much swap as real memory...
>
>Swap space is allocated (it's cheap - just a counter) whenever a new
>page of program memory is created. This means that when a program is
>started, the text, initial stack and data segments have backing store
>allocated in swap. ...

I'm still puzzled.  Doesn't that mean that the answer to Shoshana's question
is "yes" -- if you have more main memory than disk-based swap space,
you could never use all the main memory for user programs, because a disk-based
swap page would need to be allocated for each user page?

I know other UNIX implementations behave this way but thought SGI
was among those who had fixed this.  The answer actually affects us -- our main
Iris has 64MB RAM but only about 50 MB swap area, yet we certainly seem to be
able to get > 50MB and in fact > 64MB virtual space allocated to running
processes (even accounting for sharable stuff).   I haven't checked that we
can reach 50 + 64 - size of kernel data before the kernel starts killing
processes, but it seems plausible.
So like Shoshana I ask, what *is* the real scoop here?

    Stuart Levy, Geometry Group, University of Minnesota
    slevy@geom.umn.edu

jmb@patton.wpd.sgi.com (Doctor Software) (11/29/90)

Apparently, I was wrong. I've been perusing the code for 3.3, and if I can
still read C right, your total virtual memory is (real memory) + (swap
memory). If swap is too small, and a process has pages which need to be
swapped, the OS kills the process in order to avoid deadlock. This only
happens in the case that memory is filled, the OS needs to page
something out to keep going, and there are no pages of swap left free.

This seems to make the lesson be that you can run quite fine without
swap until you actually need to use it - but then watch out. There's no
garuntee of which process will actually be killed in this case, its 
just whomever has the page that can't go out.

-- jmb


-- Jim Barton
   Silicon Graphics Computer Systems
   jmb@sgi.com

james@contex.UUCP (James McQueston) (12/01/90)

In article <1990Nov28.163415.14317@odin.corp.sgi.com>, jmb@patton.wpd.sgi.com (Doctor Software) writes:
> ................... your total virtual memory is (real memory) + (swap
> memory). If swap is too small, and a process has pages which need to be
> swapped, the OS kills the process in order to avoid deadlock. This only
> happens in the case that memory is filled, the OS needs to page
> something out to keep going, and there are no pages of swap left free.
> 
> This seems to make the lesson be that you can run quite fine without
> swap until you actually need to use it - but then watch out. There's no
> garuntee of which process will actually be killed in this case, its 
> just whomever has the page that can't go out.
> ...
> -- Jim Barton
>    Silicon Graphics Computer Systems
>    jmb@sgi.com

This is what we thought, as it is mentioned in the "bug fixes" section of
the release notes for 3.3.  We tested it, and were upset that we could not
determine which process got killed when swap runs out.  We were so concerned
that we placed a call (CALL ID F1017) and I talked to Tom Mitchell about this
for quite some time.  It is very important that the users be able to determine
which process(es) get killed in order to solve the deadlock.

Example: your server is used to run a simulation that takes hours or days to
compute, and you have tuned the size of your finite-element mesh to just
barely fit within the capabilities of that machine.  N hours later, someone
else innocently runs some unimportant program on the server and causes page
deadlock.  The O.S. blindly decides which process to kill and ... pow!  Chance
determines that the simulation gets killed and you lose N hours of work.
Too bad that the other user was just checking his mail.

Suggestion: there should be some prioritization of the importance of processes
when determining which one(s) should be killed to avoid deadlock.  Perhaps
the processes priority or "nice" value could be used.  Anything is better
than nothing.  Users MUST have some way of guiding the OS in this decision.

--jhm (reply broken, use contex!james@uunet.uu.net or ...!uunet!contex!james)

dhinds@elaine23.stanford.edu (David Hinds) (12/01/90)

In article <1539@contex.UUCP> james@contex.UUCP (James McQueston) writes:
>
>Example: your server is used to run a simulation that takes hours or days to
>compute, and you have tuned the size of your finite-element mesh to just
>barely fit within the capabilities of that machine.  N hours later, someone
>else innocently runs some unimportant program on the server and causes page
>deadlock.  The O.S. blindly decides which process to kill and ... pow!  Chance
>determines that the simulation gets killed and you lose N hours of work.
>Too bad that the other user was just checking his mail.

    We had a bad thing happen yesterday that I think was a result of this
problem.  My advisor has written a graphics program for manipulating the
results of protein molecular dynamics calculations, that reads entire
dynamics trajectories into memory.  It is written in Fortran, and has huge
static zero-initialized data areas - it takes about 48MB of virtual memory.
We have 32MB of main memory and 48MB of swap space presently.  Yesterday,
someone started up this program and started reading in an MD dataset, and
walked away.  When she came back, the machine was apparently deceased.  The
mouse cursor could still move around the screen, but the buttons and console
keyboard were useless.  We couldn't get any response from the system over
the network.  We had to power down to reset things, and I lost a simulation
that had logged about 120 hours of CPU time.  I can only guess that when
the virtual memory limit was reached, something important was killed that
crippled the system.  This was under 3.3.1, by the way.

 -David Hinds
  dhinds@cb-iris.stanford.edu