doelz@urz.unibas.ch (Reinhard Doelz) (10/06/88)
Sorry to send a message again, but I got a real hard flame on my first posting telling me that I didn't specify the problem exactly. OK, here are the details: *step 1: create a batch job* I created a new queue on the IRIS 4D/80 in /usr/lib/cron/queuedefs : f.1j9n and submitted a DISCOVER job into this queue by at -qf 20:00 com0.csh ^D (DISCOVER is a huge molecular dynamics program purchased from BIOSYM at San Diego, and I created a shell called com0.csh to define the environment, aliases and run the job). If I look now on the processes running after a while, I get lines like F S UID PID PPID C PRI NI P SZ:RSS WCHAN TTY TIME COMD 30 R 110 3394 3376 80 119 39 * 5613:794 ? 785:10 discover 30 S 110 3375 699 0 30 29 * 28:21 800cc5f8 ? 0:00 sh 30 S 110 3376 3375 0 30 29 * 51:36 800cca24 ? 0:00 com0.csh which tell me that DISCOVER is running with a low priority and the size of memory needed is rather high. *step 2: create other jobs which are eating up memory* INSIGHT, another program from BIOSYM, uses lots of memory as well, and I run this online as a process at the console. From there, I wanted to start another DISCOVER job, which means that I tried to occupy another 5600 pages additionally to the already existing INSIGHT (3748) and the (batch-)DISCOVER job (5613). BUT: I do get a failure called growreg --- not enough memory to allocate 5086 pages. *step 3: Looking at the swapspace* I started the DISCOVER job yesterday night and tried to start the other one today at 8:30. This is the system accounting if called with the sar: MODL MODL 4D1-3.0 07221426 IP4 10/06/88 00:00:01 freemem freeswp 01:00:00 1434 101256 02:00:01 1427 101256 03:00:00 1416 101256 04:00:01 1411 101256 05:00:01 1414 101264 06:00:01 1414 101264 07:00:00 1404 101264 08:00:00 1405 101264 08:20:00 1414 101264 08:40:01 1415 101264 09:00:01 1325 101200 09:20:01 1432 101200 09:20:01 freemem freeswp Average 1413 101251 As you can see, there is *no* difference between the night and today, which is a real pain because the batch job should swap on the disk in order to make the online jobs running. *question:* The only hint in the so-called 'manual' is the extending of swap space in order to enlage system's performance. But apparently the system doesn't swap at all or at least it does in insufficiently, so this doesn't yield a better result. Did anyone of you have a similar problem ??? Any comments/suggestions/flames welcome. Reinhard ************************************************************************ * Dr. Reinhard Doelz * SWITZERLAND * * Biocomputing * * * Biozentrum * doelz%urz.unibas.ch@relay.cs.net * * Klingelbergstrasse 70 * * * CH-4056 Basel * * ************************************************************************
rpaul@dasys1.UUCP (Rod Paul) (10/11/88)
(My appologies if you receive two copies of this, I had a power hit at home and was kicked off the system). What I suggest is writing a couple of lines of code that malloc() a meg at a time, when malloc() returns 0, you should know the maximum amount of memory allowed PER process. This problem sounds like a similar one I encountered on a 4D/70 running sys 2.0. One of my machines had 150 meg of swap space but large processes kept crashing. It turned out that the kernel was configured to only allow 33 meg per process. I beleive the variable I changed was 'UMEM', I'll check my notes on the procedure to fix things tomorrow and let you know. In the meantime you may want to check up on kernel configuration just to see if this is in fact related to your problem.