del@fnx.UUCP (Dag Erik Lindberg) (07/03/90)
First, let me appologize for the multiple cross posting. I would not do this were it not for the unusual situation I find myself in. If I hit a group that is inappropriate, please ignore this message. In a nutshell: When news is unbatching, at some point the number of free inodes suddenly goes to zero, and the news starts trying to fill the bit bucket on the floor. Killing rnews, unmounting the file system, fsck'ing, and remounting brings back the free inodes. The problem occurs at a random time, from a hundred articles to a couple of thousand articles. The problem does not occur with any other software I have tried which uses up large quantities of inodes. (For example "find comp -print | cpio -pdvm" into another directory on the news file system will happily use all the inodes without exhibiting the problem.) I am not aware of any way that a user program such as rnews can corrupt the free inode table. The details: The system is a Mylex 20 Mhz w/64k cache, 8 MB memory, ATI VGA wonder. I have unbatched news with no serial ports installed in the system, same problem. I have tried a DTC RLL card, WD 7000 FASST SCSI card, Adaptec 1542 SCSI, and all have shown the same problem. I am currently running ISC Unix. I have stripped down the system to just the video card and the disk controller to try to find this problem. Having heard some rumors of Mylex motherboards being unreliable running Unix, I have come close to deciding that the M.B. is the problem, but for the following reasons I am hesitant to shell out for a new M.B.: - I have pounded the system hard for up to 2 weeks with *NO* other problems. "Hard" means 3 virtual screens active, 1 running VPIX (I can keep several logins pretty active), and typically 2 users (not myself) logged in on serial lines, 1 of them running VPIX on a Wyse60, and uucp traffic doing mail, news, etc. With this load swapping starts occurring even with 8 MB of memory. - Mylex claims they have fixed their early problems with running Unix, and my M.B. should work. - If I 'fix' the file system by saving all data, mkfs, then restore, I can run rnews and it will chug along until it is done or *legitimately* runs out of inodes. The problem does not show up until I have run expire on the news file system! - This one I have not tried but twice, so I am not sure it is repeatable, but I forced a rebuild of the free list using fsck, and shutdown the system. After booting I again ran rnews with no problems, until I run rnews. I am desperate at this point, having spent unbelievable numbers of hours on this problem. I just want to get it fixed, but have exhausted all my ideas. If anyone has ever seen anything like this before, *please* drop me a line via E-mail. Use Email as my news system is totally unreliable right now. I'm only one hop away from uunet if you use the path below. Thanks in advance! -- del AKA Erik Lindberg uunet!pilchuck!fnx!del Who is John Galt?
fkk@wynge.Central.Sun.COM (Frank Kaefer - Sun Germany CSD - Munich) (07/03/90)
del@fnx.UUCP (Dag Erik Lindberg) writes: | In a nutshell: When news is unbatching, at some point the number of free | inodes suddenly goes to zero, and the news starts trying to fill the bit | bucket on the floor. Killing rnews, unmounting the file system, fsck'ing, | and remounting brings back the free inodes. The problem occurs at a random | time, from a hundred articles to a couple of thousand articles. The problem I have EXACTLY the same problem ! And I am also extremly desperate. My machine is a AT386 running Interactive IX 2.0.2. If anyone has any idea, I would be very glad to get some help. Frank. -- ============================================================================= Frank Kaefer | SUN Microsystems GmbH | Phone: (+49) 89 46008-321 German Answer Center | Am Hochacker 3 | FAX: (+49) 89 46008-400 Datacomm | D-8011 Grasbrunn | ============================================================================= e-mail: fkk@Germany.Sun.COM (...!sun!sunuk!sungy!fkk) fkk@sunmuc.UUCP (...!unido!sunmuc!fkk) suninfo!fkk fkk@stasys.sta.sub.org =============================================================================
cpcahil@virtech.uucp (Conor P. Cahill) (07/03/90)
In article <586@fnx.UUCP> del@fnx.UUCP (Dag Erik Lindberg) writes: > > [story of running out of inodes deleted] > >The details: The system is a Mylex 20 Mhz w/64k cache, 8 MB memory, ATI >VGA wonder. I have unbatched news with no serial ports installed in the >system, same problem. I have tried a DTC RLL card, WD 7000 FASST SCSI card, >Adaptec 1542 SCSI, and all have shown the same problem. I am currently >running ISC Unix. I have stripped down the system to just the video card >and the disk controller to try to find this problem. The only pertinent portion of this is the Operating system and you don't specify the version. This inode thing is a known bug that as far as I know was fixed in version 2.0.2 of 386/ix (I have been running news on a partition without running out of inodes for a year). There is a binary patch for microport unix that is reputed to work correctly for 386/ix. If you want to try it, send me email & I will send it to you. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
rli@buster.irby.com (Buster Irby) (07/03/90)
cpcahil@virtech.uucp (Conor P. Cahill) writes: >In article <586@fnx.UUCP> del@fnx.UUCP (Dag Erik Lindberg) writes: >> >> [story of running out of inodes deleted] >> >>The details: The system is a Mylex 20 Mhz w/64k cache, 8 MB memory, ATI >>VGA wonder. I have unbatched news with no serial ports installed in the >>system, same problem. I have tried a DTC RLL card, WD 7000 FASST SCSI card, >>Adaptec 1542 SCSI, and all have shown the same problem. I am currently >>running ISC Unix. I have stripped down the system to just the video card >>and the disk controller to try to find this problem. >The only pertinent portion of this is the Operating system and you don't >specify the version. >This inode thing is a known bug that as far as I know was fixed in version >2.0.2 of 386/ix (I have been running news on a partition without running >out of inodes for a year). No, it did not. I ran 2.0.2 for only a few days before the bug reared its ugly head. Included below is the patch I received from T. William Wells. I applied it and it fixed the problem (Thanks again Bill). The problem does however appear to have been fixed in version 2.2. ------------------------inode.fix---------------------- From: uunet!twwells.com!bill (T. William Wells) Date: 16 Oct 89 18:54:06 EDT (Mon) Subject: Inode bug fix for ISC 2.0.2 Since I've got so many requests for the fix for the inode bug for ISC 2.0.2, I decided to post it. Sorry I've been so slothful on this; various things popped up and I wasn't able to get it all together. In case you want to know what this is: there is a kernel bug that, on occasion, and particularly if you are running a newsfeed, will cause the file system to think that there are no more free inodes, thus preventing creation of new files. Running fsck fixes the inode count but doesn't prevent the problem from occuring again. The problem is this: there is a "free inode cache" stored in the superblock for each file system; this block is kept in memory when the file system is mounted, thus the inode cache permits rapid allocation of inodes. When the cache is emptied, the kernel tries to read more inodes from the disk to fill the cache and then retries the allocate. If the kernel is unable to read more inodes from the disk, it assumes that there are no more free inodes. There is an optimization in the allocation code, which depends on the condition that the lowest free inode is always in the inode cache. What it does is to start the disk read from that lowest inode, instead of the first inode. This means that the inode table doesn't have to always be fully read, for what could be a significant savings in allocating inodes. (Consider what might happen when almost all inodes are in use.) However, the kernel does not maintain that condition properly. It is possible for the kernel to forget the lowest inode, with the effect that the kernel tries to read from some place too far in the inode table, and maybe discovers that there are no free inodes. When that happens, the kernel clears the available inode count, and the file system is essentially kaput. The right fix for this would be to always maintain that condition. However, a binary patch for that would be tricky at best, and maybe impossible. A patch that is possible is to have the allocation routine try from the beginning of the inode table whenever it fails to read inodes from the disk, relying on the free inode count to tell when the table is empty. This changes the condition that must be maintained to: the free inode count must always be accurate. (Having the free inode count never be larger than the actual number of free inodes is sufficient for the patch to not cause problems.) I made a similar patch for Microport SysV/386 3.0e and have been running it for most of this year without problems. I was asked to solve this for Interactive 2.0.2 and did so. However, I did the work on my Microport system, and the enclosed shell script works on that; it ought to work on an Interactive system but I've not tried this. With that caveat, here is what you do to patch your kernel. First, run the shell script. Make sure that it behaved correctly. Then save a copy of your good kernel and /etc/conf/pack.d/s5/Driver.o. If you are really paranoid, back up your whole system, though this shouldn't be necessary. Replace the Driver.o file with the one on /tmp. Finally, rebuild your kernel. That particular bug should never bite you again. Further kernel builds will have the bug fixed as well. If you have any problems, send me e-mail. I'll try to get back to you quicker than I did with this! If the patch causes some nasty kind of crash, please post immediately in hopes that others will read your message before having tried the patch. in=/etc/conf/pack.d/s5/Driver.o out=/tmp/Driver.o # check that we have the right Driver.o file if [ x"`sum $in`" != x"50880 81 $in" ]; then echo "sum failed" exit 1 fi if [ x"`sum -r $in`" != x"33908 81 $in" ]; then echo "sum -r failed" exit 1 fi # copy the file and make an appropriate fix { dd ibs=1 obs=1k count=1977 dd bs=19 count=1 of=/dev/null echo '\074\0144\017\0204\0327\0376\0377\0377\0146\c' echo '\0307\0207\0324\00\00\00\0144\00\0353\0151\c' dd bs=16k } <$in >$out # compare the list of differences against the expected differences cat <<\+ >/tmp/fix$$ 1978 75 74 1980 0 17 1981 0 204 1982 0 327 1983 164 376 1984 14 377 1985 146 377 1986 307 146 1987 207 307 1988 324 207 1989 0 324 1992 144 0 1993 0 144 1994 353 0 1995 152 353 1996 220 151 + if cmp -l $in $out | cmp -s - /tmp/fix$$; then rm /tmp/fix$$ exit 0 else rm /tmp/fix$$ echo "patch failed" exit 1 fi --- Bill { uunet | novavax | ankh | sunvice } !twwells!bill bill@twwells.com -- Buster Irby buster!rli
jose@hpsad.HP.COM (Jose Gomez-Rubio (SEED Student)) (07/04/90)
Many eons ago, when I had my 3B1 and running news, I too ran out of i-nodes. I was told it was a software bug in the UNIX System V code. Someone also gave me a tech sheet on the problem. Last I've heard it's been corrected in UNIX System V Release 4. Another rap I heard was to mkfs a file system for news because of some 64K i-node limit on a single file system. Dunno about this. -- jose@hpsad.hp.com
eric@egsner.cirr.com (Eric Schnoebelen) (07/04/90)
In article <fkk.646998181@wynge> fkk@wynge.Central.Sun.COM (Frank Kaefer - Sun Germany CSD - Munich) writes: - del@fnx.UUCP (Dag Erik Lindberg) writes: - - | In a nutshell: When news is unbatching, at some point the number of free - | inodes suddenly goes to zero, and the news starts trying to fill the bit - | bucket on the floor. - - I have EXACTLY the same problem ! And I am also extremly desperate. - My machine is a AT386 running Interactive IX 2.0.2. If anyone has - any idea, I would be very glad to get some help. Well, I have found that C news does not aggravate this bug in the System V file system code the way that B news does. Actually, it does not induce the bug at al. How do I know? Well, I picked up a patch for the System V, release 2 ialloc() function that printed out a message when the error occurred, and after switching to C news, I never saw the error. C news runs beautifully on 386/ix, just as it did on my Microport System V/AT system. In the case of the Microport System, switching to C news made the machine usable once again. Just a satisified C news user.. [Thanks Henry and Geoff!] -- Eric Schnoebelen eric@cirr.com schnoebe@convex.com Friendships are fragile things, and require as much handling as any other fragile and precious thing. -Randolph S. Bourne
cpcahil@virtech.uucp (Conor P. Cahill) (07/04/90)
In article <1990Jul3.224322.19200@egsner.cirr.com> eric@egsner.cirr.com (Eric Schnoebelen) writes: >In article <fkk.646998181@wynge> fkk@wynge.Central.Sun.COM > (Frank Kaefer - Sun Germany CSD - Munich) writes: [description of infamous system V inode problem deleted] > Well, I have found that C news does not aggravate this bug in >the System V file system code the way that B news does. Actually, it >does not induce the bug at al. How do I know? Well, I picked up a >patch for the System V, release 2 ialloc() function that printed out a >message when the error occurred, and after switching to C news, I never >saw the error. I would have to second this. I have been running C news with a full newsfeed and our system has been up for 2 or 3 months at a time without ever seeing the problem. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc., uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
pat@rwing.UUCP (Pat Myrto) (07/05/90)
In article <fkk.646998181@wynge>, fkk@wynge.Central.Sun.COM (Frank Kaefer - Sun Germany CSD - Munich) writes: < del@fnx.UUCP (Dag Erik Lindberg) writes: < < | In a nutshell: When news is unbatching, at some point the number of free < | inodes suddenly goes to zero, and the news starts trying to fill the bit < | bucket on the floor. Killing rnews, unmounting the file system, fsck'ing, < | and remounting brings back the free inodes. The problem occurs at a random < | time, from a hundred articles to a couple of thousand articles. The problem < < I have EXACTLY the same problem ! And I am also extremly desperate. < My machine is a AT386 running Interactive IX 2.0.2. If anyone has < any idea, I would be very glad to get some help. I had the same difficulty - I had been messing with the kernal tunable parameters, but what I had was sane. When in desparation I used kconfig and set the tuning back to one of the defaults for memory size, the problem went away, and hasn't returned (yet). Some combination apparantly tickles the infamous (apparantly not fixed) inode bug. -- pat@rwing (Pat Myrto), Seattle, WA ...!uunet!pilchuck!rwing!pat ...!uw-beaver!uw-entropy!dataio!/ WISDOM: "Travelling unarmed is like boating without a life jacket"