[unix-pc.general] Where or where has my memory gone?

lenny@icus.islp.ny.us (Lenny Tropiano) (11/09/89)

Well my machine was up for 30 days, 20 hours, 14 minutes and it just 
flipped out! Things were running slowly.  Does memory get fragmented 
like disks do? Should you reboot frequently, and what is the frequency?   
I was running a news unbatch (compress running...). 
I was in elm, and I have a few drivers loaded (just a few) :-)

 DEVNAME  ID  BLK CHAR  LINE   SIZE    ADDR     FLAGS
    wind   0   -1    7   -1  0x9000   0x54000 ALLOC BOUND 
    lipc   1   -1   -1   -1  0x7000  0x360000 ALLOC BOUND 
     cmb   2   -1   -1   -1  0x3000   0x5d000 ALLOC BOUND 
   voice   3   -1    9   -1  0xa000  0x367000 ALLOC BOUND 
      tp   4   -1   10   -1  0x3000  0x371000 ALLOC BOUND 
 starlan   5   -1   12   -1 0x14000  0x3de000 ALLOC BOUND 

Then of course I have the typical StarLAN daemons running.  My daemons,
and whatever else (probably an uucico & vi).  Am I just asking for too much
out of the machine for 3.5MB.  I know this has been discussed before,
shouldn't the processes just swap out to disk, if there isn't enough
memory?  I do have the standard 5MB swap partition.  The last dying word
of my machine was:

sysinfo: cannot read /dev/rfp002 

(NOTE: there are not HDERR's in unix.log)  The machine was very quiet,
sounded like the news unbatching halted.  The LEDs were normal.  The
mouse responded and tried to select windows.  Then I got a prompt back:

[654 Filecabinet] ps -ef
Killed
[655 Filecabinet] ps -ef
Killed

Would an extra .5MB help?  I'm going to do the 4MB upgrade (1.5MB on
the combo card, a hardware patched 512K card, and 2MB on the 
motherboard).  

Should I decrease the available memory to compress (USERMEM) and 
recompile?  What major effects will this have on compressing (time wise)?

-Lenny
-- 
| Lenny Tropiano            ICUS Software Systems      [w] +1 (516) 589-7930 |
| lenny@icus.islp.ny.us     Telex; 154232428 ICUS      [h] +1 (516) 968-8576 |
| {ames,pacbell,decuac,hombre,sbcs,attctc}!icus!lenny     attmail!icus!lenny |
+------- ICUS Software Systems -- PO Box 1;  Islip Terrace, NY  11752 -------+

gil@limbic.UUCP (Gil Kloepfer Jr.) (11/09/89)

In article <1020@icus.islp.ny.us> lenny@icus.islp.ny.us (Lenny Tropiano) writes:
>Well my machine was up for 30 days, 20 hours, 14 minutes and it just 
>flipped out! Things were running slowly.  Does memory get fragmented 
>like disks do?

Okay, I'll bite.  Note that the following is definitely based on theory,
and not on documented experience, but it sounds reasonable.  The way
the UNIX-pc MM works, memory will probably be "fragmented" 99% of the
time, but the hardware page tables should map the memory into what
appears to be a "contiguous" section of memory.  What it can't get in
"real" memory, it will page off to disk as "virtual" memory by virtue
of the memory management system.

Now someone with a good working knowledge of the internals of the UNIX-pc
kernel (you know who you are ;-) could probably check the way that pages
are allocated and freed and whether the page table entries are being
maintained properly.  Someone mentioned in an earlier article that
he never saw his machine (via sysinfo) go below .5 meg free.  This
might (??) be a related problem.

> Should you reboot frequently, and what is the frequency?   

I would say that most of us would say that you should NEVER have
to reboot your machine.  The AT&T hotline would most certainly say
you should, every day if possible ;-)  [if that doesn't work, you
could always reformat the hard disk and reload the OS ;-) ;-) ]
Considering the number of daemons running on your machine, and
the nature of the devices you have, it might be a good idea to do
a "ps -lef" and check the SZ and RSZ fields (I think those are memory,
right folks?!) and see if any of them continuously increase from day to
day.  One of these daemons might be eating your memory to oblivion!

>The last dying word of my machine was:
>sysinfo: cannot read /dev/rfp002 

Hmmm....  System buffers maybe?

For those who find it necessary to flame for incorrect information, my
disclaimer here is that I don't claim to know all about this, but I'm
hoping that these comments will encourage some thought about what might
be happening.

Gil.
-----
| Gil Kloepfer, Jr.
| ICUS Software Systems/Bowne Management Systems (depending on where I am)
| ...ames!limbic!gil

spear@druco.ATT.COM (Steve Spearman) (11/10/89)

> 
> In article <1020@icus.islp.ny.us> lenny@icus.islp.ny.us (Lenny Tropiano) writes:
>>Well my machine was up for 30 days, 20 hours, 14 minutes and it just 
>>flipped out! Things were running slowly.  Does memory get fragmented 
>>like disks do?

As far as paging goes, the UnixPC definitely can page correctly.
I was running a 512K system for a while (yes, it was very painful)
and EVERYTHING would page out.

I really think the problem you are experiencing is related to
a system fault, not insufficient memory size.  I run for months
with 2.5 meg with no problems, but have nowhere near the amount
of drivers and load it sounds like you have.  Sounds like either
a driver bug or a real kernel bug.

At this point, a planned restart might be a reasonable option
rather than pursuing something that takes so long to reproduce.
Weekly or monthly would probably be fine.  I've seen others that
believe this is a good idea.

Steve Spearman  spear@booboo.att.com

jbm@uncle.UUCP (John B. Milton) (11/11/89)

In article <577@limbic.UUCP> gil@limbic.UUCP (Gil Kloepfer Jr.) writes:
>In article <1020@icus.islp.ny.us> lenny@icus.islp.ny.us (Lenny Tropiano) writes:
>>Well my machine was up for 30 days, 20 hours, 14 minutes and it just 
>>flipped out! Things were running slowly.  Does memory get fragmented 
>>like disks do?
Slow down sounds like the clist problem. Was your disk doing lots of recals?
(unusual buzzing or humming on some drives)

>the nature of the devices you have, it might be a good idea to do
>a "ps -lef" and check the SZ and RSZ fields (I think those are memory,
>right folks?!) and see if any of them continuously increase from day to
>day.  One of these daemons might be eating your memory to oblivion!
But that is still the size(s) of the process, not the working set (the part
that wants to be in RAM all the time). Various RAM based fragmentation in the
kernel can slow the system down a little, but tremendously.

>>The last dying word of my machine was:
>>sysinfo: cannot read /dev/rfp002 
Disk error bud, no other reason fo read failure, check /usr/adm/unix.log for
the bad news.

John
-- 
John Bly Milton IV, jbm@uncle.UUCP, n8emr!uncle!jbm@osu-cis.cis.ohio-state.edu
(614) h:252-8544, w:785-1110; N8KSN, AMPR: 44.70.0.52; Don't FLAME, inform!

wtm@neoucom.UUCP (Bill Mayhew) (11/12/89)

I have 2 meg on the motherboard and a 67 meg drive on my 3b1.
According to sysinfo, the smallest I've seen the free store drop is
497K; even when I've tried to excessively load the machine.  I'm
not sure if the machine is sensing a low water mark at 512K and
starting to page to disk at that point or what.  Heaven only knows
what algorithm the kernel is using.

At work, we use a 7300 with a 40 meg disk with 1 meg on the
motherboard and 1.5 on cards.  We have three serial ports going.
We use tty000 to drive a trailblazer and tty000 connected to a dz
port on a vax 750.  The 7300 is our news server since we've had a
lot of problems with SILO overruns on the vax ports talking to the
trailbalzer.  I've noticed that about every two weeks the 7300 at
work goes brain-dead with a normal console display and working
mouse, but ignores any input via the keyboard or tty.  It would
seem that all the jobs in the ready queue, except for smgr, are hung
becaue anything such as sysinfo does not update the display, but
the time and date in W5 at the top of the screen keep updating.
We're running HDB uucp.  Unfortunately, since the machine is locked
up it is rather difficult to tell what is going on.

At home, I had a LOT of problems running version 3.51 and the stock
uucp.  My machine crashed at least once a day.  After considerable
consultation with ye olde hotline, AT&T replaced the motherboard
after trying a recompiled uucico that they downloaded to my
machine.  Even with the new motherboard, I still had crashing.  The
HDB uucp solved the problems I was having.  The crashes that I got
at home were the same in symptom that we get on occasion at work.
The difference is that at work we transfer about 4 megs in and 4
megs of data out of the 7300 every day.  At home, my input is about
400 to 750K per day.

We can live with the occasional crashes at work, since once every
two weeks or so isn't that awful :-).  I built a little box with a
6502 CPU chip that monitors the serial line going to the vax.  If
the monitor doesn't see a uucico from the 7300 at least once an
hour, it picks a relay to reboot the 7300.

What I have noticed on my machine is that big chunks of memory
erode over the course of two or three days and then reappear.  The
free store is about 900K right after booting the mahcine, and this
usually drops to aobut 625K over a couple of days, after which the
memory seems to return from limbo.

As someone mentioned a while ago, the clist buffers seem to slowly
go away too.  My machine starts out with about 130, and that drops
to about 120 after a while, but seems to stabilize at 120.

One last item.  I've talked to people that had tons of problems
with the 3.51 uucp such as myself, while other people have had no
problmes at all.  One person speculated that use of /dev/ph* for
uucp was a factor in the crashing.

Bill
wtm@neoucom.edu

karl@zip.UUCP (Karl F. Fox) (11/13/89)

In article <577@limbic.UUCP> gil@limbic.UUCP (Gil Kloepfer Jr.) writes:
>In article <1020@icus.islp.ny.us> lenny@icus.islp.ny.us (Lenny Tropiano) writes:
>>The last dying word of my machine was:
>>sysinfo: cannot read /dev/rfp002 
>
>Hmmm....  System buffers maybe?

Nope, /dev/rfp002 is a character device, not a block device, so it doesn't
use kernel block buffers.
-- 
Karl F. Fox, Morning Star Technologies, Inc.               karl@MorningStar.COM