[comp.bugs.sys5] Corrupted Root File System on 3B2s

pha@cs.rit.edu (Paul H Allen) (07/24/90)

I maintain three 3B2/522s and one 3B2/600, each running SysV Rel 3.2.1 V3.
The 3B2/522s advertise their user partitions which are mounted by the 3B2/600
where the faculty and students logon to do their work.  All rfs is supported by
Wollongong's TCP/IP WIN/3B Release 3.0.1.

A while back the root partition on the 3B2/600 ran out of inodes.  I checked
the obvious, i.e., the tmp directory, all the mount points and found
nothing.  We have a nice message setup to relay info to the students that
resided on the root partition.  It has a small database that consumed a lot of
inodes.  So I moved it to /usr/spool and gained back over 400 inodes.  This
action gave me an immediate solution to my problem so I didn't both looking
any further.

So last week, the root partition ran out of inodes again.  All I could think of
was the infamous loss of inodes that commonly plagues news partitions, at
least it did mine.  So I put the 3B2/600 in single user and did a fsck on root.
I couldn't believe my eyes, all I saw was `LINK COUNT TABLE OVERFLOW
(CONTINUE?)' scroll by during Phase 1 and many requests to CLEAR files during
Phase 4.  It took between 80 to 100 fscks before the file system became clean.
I booted off the Operating System Utilities tape to obtain a standalone
shell just so the machine wouldn't reboot after each fsck.  By the way, does
anybody know how to boot to single user from firmware?

I also checked the root file systems on the 3B2/522s and found them to be
corrupted to varying degrees.

Quite a while ago there was a small discussion regarding how it wasn't a
good idea to mount file systems without doing a fsck just because they
were unmounted cleanly.  Would it be OK if I inserted a mandatory fsck
entry in the /etc/init.d/MOUNTFSYS file just before the mountall line?

Also the 3B2/600 has been crashing about every 2 or 3 days with a
PANIC: Kernel MMU fault (F_ACCESS).  Could this be related to the root
file system problem?  Diagnostics reveals nothing wrong with the hardware.
My next step will be to reload the operating system (uugh!).

Have you done a fsck on your root file system lately?  After fsck finishes,
there is a summary of x files checked using y blocks and leaving z blocks free.
The first fsck listed about 3100 files and the last one listed less
than 1000 files.  Unfortunately, the number of files checked is reduced by
19 for each fsck done.

Any guidance or recommendation would be appreciated.

Paul Allen  (716) 475-5254
pha@cs.rit.edu
...!rochester!rit!pha
pha1775@ritvax (BITNET)

jrallen@devildog.att.com (Jon Allen) (07/26/90)

In article <1777@cs.rit.edu> pha@cs.rit.edu (Paul H Allen) writes:
. . .
>Also the 3B2/600 has been crashing about every 2 or 3 days with a
>PANIC: Kernel MMU fault (F_ACCESS).  Could this be related to the root

This is actually a bug in TCP/IP 3.0.1.  You should be able to call your
AT&T support line and request a fix to the TCP/IP F_ACCESS MMU fault
problem.  Many of my machines crashed several times a week for 6 months
before I got a fix in April.  Since April, I have had no problems.

-Jon