[comp.sys.sgi] SWAPPER/System performance pt. 2

doelz@urz.unibas.ch (Reinhard Doelz) (10/06/88)

Sorry to send a message again, but I got a real hard flame on my first 
posting telling me that I didn't specify the problem exactly. 
OK, here are the details: 

*step 1: create a batch job*
I created a new queue on the IRIS 4D/80 in /usr/lib/cron/queuedefs :
f.1j9n
and submitted a DISCOVER job into this queue by
at -qf 20:00
com0.csh
^D
(DISCOVER is a huge molecular dynamics program purchased from BIOSYM at
San Diego, and I created a shell called com0.csh to define the environment,
aliases and run the job). If I look now on the processes running after a while,
 
I get lines like

    F S  UID   PID  PPID  C PRI NI P   SZ:RSS    WCHAN TTY      TIME COMD
   30 R  110  3394  3376 80 119 39 * 5613:794          ?       785:10 discover
   30 S  110  3375   699  0  30 29 *   28:21  800cc5f8 ?        0:00 sh
   30 S  110  3376  3375  0  30 29 *   51:36  800cca24 ?        0:00 com0.csh

which tell me that DISCOVER is running with a low priority and the size
of memory needed is rather high. 

*step 2: create other jobs which are eating up memory*
INSIGHT, another program from BIOSYM, uses lots of memory as well, and I 
run this online as a process at the console. From there, I wanted to start 
another DISCOVER job, which means that I tried to occupy another 5600 pages
additionally to the already existing INSIGHT (3748) and the 
(batch-)DISCOVER job (5613).
BUT: I do get a failure called
growreg --- not enough memory to allocate 5086 pages.

*step 3: Looking at the swapspace*
I started the DISCOVER job yesterday night and tried to start the other one 
today at 8:30. This is the system accounting if called with the sar:

MODL MODL 4D1-3.0 07221426 IP4    10/06/88

00:00:01 freemem freeswp
01:00:00    1434  101256
02:00:01    1427  101256
03:00:00    1416  101256
04:00:01    1411  101256
05:00:01    1414  101264
06:00:01    1414  101264
07:00:00    1404  101264
08:00:00    1405  101264
08:20:00    1414  101264
08:40:01    1415  101264
09:00:01    1325  101200
09:20:01    1432  101200


09:20:01 freemem freeswp
Average     1413  101251

As you can see, there is *no* difference between the night and today, which
is a real pain because the batch job should swap on the disk in order to make
the online jobs running. 

*question:* 
The only hint in the so-called 'manual' is the extending of swap space in 
order to enlage system's performance. But apparently the system doesn't swap
at all or at least it does in insufficiently, so this doesn't yield a better 
result.
 
Did anyone of you have a similar problem ???

Any comments/suggestions/flames welcome.

Reinhard 

  
  ************************************************************************
  *   Dr. Reinhard Doelz           *           SWITZERLAND               *
  *     Biocomputing               *                                     *
  *      Biozentrum                * doelz%urz.unibas.ch@relay.cs.net    *
  * Klingelbergstrasse 70          *                                     *
  *     CH-4056 Basel              *                                     *
  ************************************************************************

rpaul@dasys1.UUCP (Rod Paul) (10/11/88)

(My appologies if you receive two copies of this, I had a power hit at home and
 was kicked off the system).

What I suggest is writing a couple of lines of code that malloc() a meg at
a time, when malloc() returns 0, you should know the maximum amount of
memory allowed PER process. 

This problem sounds like a similar one I encountered on a 4D/70 running sys 2.0.
One of my machines had 150 meg of swap space but large processes kept crashing.
It turned out that the kernel was configured to only allow 33 meg per process.
I beleive the variable I changed was 'UMEM', I'll check my notes on the
procedure to fix things tomorrow and let you know.

In the meantime you may want to check up on kernel configuration just to see
if this is in fact related to your problem.