demasi@paisano.UUCP (Michael C. De Masi) (11/04/87)
Hello people, I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having a strange problem with the news file system (news is on its own file system to prevent it from interfering with other data) Soon after I first installed news, I found that I had run out of inodes long before data blocks. So, I backed up the news file system, unmounted it, did a mkfs on the disk partition with the exact same size as the origional only with 1600 more inodes, then remounted the file system and restored the news data. Everything went smoothy for a while, until the feed dried up temporarily and the news volume dropped, thus emptying out the file system somewhat. When the feed returned to full volume, I noticed that I was getting "out of inode" errors when I knew full well there weren't nearly that many files in the system. So I unmounted the news file system, fsck'd it, got a "free inode count in superblock (fix?)" message back from fsck, told it to fix it, remounted the file system and again everything went smoothly until the news volume dropped again, and the same thing happened. What I'm starting to wonder is whether or not I have accidentally created some sort of stick point in the free inode list that can only be gotten around with an fsck? Because of recent fluctuations with my news feed, it has become a real hassle to constantly have to fix the file system, so I was wondering if anyone out there had ever had a problem like this or a possible solution? Is it something I did or some strange interaction between news & Sys V? Any ideas? Awaiting your replies, -- Michael C. De Masi - AT&T Communications (For whom I work and not speak) 3702 Pender Drive, Fairfax, Virginia 22030 Phone: 703-246-9555 UUCP: seismo!decuac!grebyn!paisano!demasi "There are monkey boys on the premises." Unknown red Lectroid.
slb@boole.acc.virginia.edu (sandy) (11/05/87)
We have had the same problem (frequently running out of inodes on a file system when you know darn well there should be inodes left. Fsck always complains about bad free inode count.) with the file system that hold our news and mail and print queues. We have a few 3B15's, a few 3B5's and lots of 3B2's. We don't run news on the 3B2's, and I don't think I have ever observed the problem there, but all the other machines exhibit this behavior. I don't see how remaking the partition with more inodes can cause this - rather I think that there is some bug in the file system code and the high activity that you see in a spool type partition somehow causes a race condition to happen. And boy, is it ever a drag - especially when it happens to a machine that you count on to feed ten other machines. You didn't say what release you run - we run 3.0 on the 2's, 2.0 on the 5's and 2.1 on the 15's. -- sandy bryant slb@virginia.edu uunet!virginia!slb
brian@sdcsvax.UCSD.EDU (Brian Kantor) (11/05/87)
In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes: > >I'm running usenet on a 3b2/400 Sys V r2.0.1 and ... >... I noticed that I was getting "out of inode" errors >when I knew full well there weren't nearly that many files >in the system. So I unmounted the news file system, fsck'd >it, got a "free inode count in superblock (fix?)" message >back from fsck, told it to fix it, remounted the file system >and again everything went smoothly until the news volume >dropped again, and the same thing happened. I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the news filesystem running out of inodes - a reboot always fixes it. It is almost as though the inodes weren't being properly freed after the news is expired. I don't think it's hung processes hanging on to the inode after the directory entry is removed, since I often don't find any news processes running. I find it interesting that the reboot (without an fsck, since the filesystem was marked as clean when the system went down) fixes the problem. If it were truely a buggered superblock, one would not expect that. Since news is the one system we run on this machine which creates multiple links to files, I suspect it might be related to that. My workaround is to simply schedule a reboot each Monday morning at 3 am. That way the system is fresh and clean when I get to work that week. Admittedly, that's fixing the symptom and not the problem. Brian Kantor UCSD Office of Academic Computing Academic Network Operations Group UCSD B-028, La Jolla, CA 92093 USA
sverre@fesk.UUCP (Sverre Froyen) (11/06/87)
in article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) says: > I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having > a strange problem with the news file system .... (text deleted) > ... When the feed returned to full > volume, I noticed that I was getting "out of inode" errors > when I knew full well there weren't nearly that many files > in the system. So I unmounted the news file system, fsck'd > it, got a "free inode count in superblock (fix?)" message > back from fsck, told it to fix it, remounted the file system > and again everything went smoothly until the news volume > dropped again, and the same thing happened. I have seen the same thing on an ICM3216 running SysV.2.2. The inode count of the spool file system (where news reside) will drop from 12000 to 0 within minutes (perhaps seconds) while unpacking a compressed news batch (rnews -U). Recourse is to go to single user mode and do an fsck on the file system. This will restore all (12000) lost inodes. This scenario happens about once per month and I have not noticed a correlation with the news volume. -- Sverre Froyen UUCP: boulder!fesk!sverre, sunpeaks!seri!fesk!sverre ARPA: froyen@nmfecc.arpa BITNET: froyen@csugold.bitnet
news@jpusa1.UUCP (usenet) (11/06/87)
Summary:
Expires:
In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
-I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
-a strange problem with the news file system (news is on its
-own file system to prevent it from interfering with other
-data) Soon after I first installed news, I found that I
-had run out of inodes long before data blocks.
I've had similar behaviour when the disk partition fills up and you run out
of data blocks. For some reason, when the data blocks become available again,
the inodes don't get returned to the freelist. This is on a unisoft sys5 r0
box. The cure, when it happens, is to fsck the disk. Is this a generic bug in
sys5? Anyway, try to avoid filling the partition and the problem will most
likely disappear. I've hacked an rnews that checks for space on the disk
before spooling the incoming article. It knows of a list of alternate
directories on other partitions to use when it gets dangerously low.
--
Stu Heiss {gargoyle,ihnp4}!jpusa1!stu
heiby@mcdchg.UUCP (Ron Heiby) (11/07/87)
I have seen the same thing twice in about 10 months of use of my MC68020-based system running SVR3. An fsck fixes the problem. I think it's wierd. -- Ron Heiby, heiby@mcdchg.UUCP Moderator: comp.newprod & comp.unix "I know engineers. They love to change things." McCoy
larry@kitty.UUCP (Larry Lippman) (11/07/87)
In article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) writes: > What I'm starting to wonder is whether or not I have accidentally > created some sort of stick point in the free inode list that > can only be gotten around with an fsck? Because of recent > fluctuations with my news feed, it has become a real hassle > to constantly have to fix the file system, so I was wondering > if anyone out there had ever had a problem like this or a > possible solution? Is it something I did or some strange > interaction between news & Sys V? Any ideas? If it's any consolation, I have the same problem on `kitty', which is also a 3B2. The problem indentical to yours occurs about once every 4 to 5 months. I just reboot and fsck. It is so infrequent, that I just haven't felt like tracking it down. <> Larry Lippman @ Recognition Research Corp., Clarence, New York <> UUCP: {allegra|ames|boulder|decvax|rutgers|watmath}!sunybcs!kitty!larry <> VOICE: 716/688-1231 {hplabs|ihnp4|mtune|seismo|utzoo}!/ <> FAX: 716/741-9635 {G1,G2,G3 modes} "Have you hugged your cat today?"
djt@hotps.ATT.COM (Dave Trulli) (11/07/87)
I too have been seeing my /usr/spool file system running out of inodes when news in coming in. An fsck will also fix the free inode count. I am using a 3B15 2.1.1 and have heard of it happening on a 3B2 and a 3B20 too. The problem occurs about once a week here. I dont know a way news could do this so is it a bug in news or a bug the file system code ??? -- UUCP: ihnp4!hotps!djt Dave Trulli NN2Z djt@hotps.ATT.COM AT&T Network Systems PACKET: nn2z@nn2z Holmdel NJ. 201-949-4774
rbl@nitrex.UUCP ( Dr. Robin Lake ) (11/08/87)
In article <4259@sdcsvax.UCSD.EDU> brian@sdcsvax.UCSD.EDU (Brian Kantor) writes: >In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes: >> >>I'm running usenet on a 3b2/400 Sys V r2.0.1 and ... >>... I noticed that I was getting "out of inode" errors >>when I knew full well there weren't nearly that many files >>in the system. So I unmounted the news file system, fsck'd >> ,,, > >I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the >news filesystem running out of inodes - a reboot always fixes it. It > ... A similar "thing" happens on our Motorola 6600 (aka Convergent MegaFrame) running SV. Now and then a news directory "locks up" and rnews can't put anything into it. We're running 2.10 news and were hoping to "cure" the problem with 2.11 soon. -- Rob Lake {decvax,ihnp4!cbosgd}!mandrill!nitrex!rbl
richard@islenet.UUCP (Richard Foulk) (11/08/87)
> > I have seen the same thing on an ICM3216 running SysV.2.2. > The inode count of the spool file system (where news reside) > will drop from 12000 to 0 within minutes (perhaps seconds) while > unpacking a compressed news batch (rnews -U). Recourse is to go > to single user mode and do an fsck on the file system. This will > restore all (12000) lost inodes. This scenario happens about > once per month and I have not noticed a correlation with the > news volume. I've encountered this problem on a couple of Dual Systems orphaned machines. I always figured it was Unisofts or Duals fault. It seems to be dependent on the ratio of free blocks to free inodes or something like that. Whenever the problem comes back I often have to do the umount/fsck/mount/unbatch cycle several times before it will settle down and stop running out of inodes. Then the problem will often stay away for weeks or months. I vaguely remember hearing something about some race condition in the kernel allowing this to happen, but I thought that had been fixed long ago. -- Richard Foulk ...{dual,vortex,ihnp4}!islenet!richard Honolulu, Hawaii
jc@minya.UUCP (John Chambers) (11/08/87)
In article <4259@sdcsvax.UCSD.EDU>, brian@sdcsvax.UCSD.EDU (Brian Kantor) writes: > In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes: > > > >I'm running usenet on a 3b2/400 Sys V r2.0.1 and ... > >... I noticed that I was getting "out of inode" errors > >when I knew full well there weren't nearly that many files > >in the system. So I unmounted the news file system, fsck'd > >it, got a "free inode count in superblock (fix?)" message > >back from fsck, told it to fix it, remounted the file system > >and again everything went smoothly until the news volume > >dropped again, and the same thing happened. > > I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the > news filesystem running out of inodes - a reboot always fixes it. You folks are just discovering a common (possibly universal) Sys/V bug. I've been able to produce this behavior on numerous machines that were clearly running different ports of Sys/V. It seems to have little to do with exactly what the software was doing. The kernel just loses track of inodes (and also blocks). The situation with blocks is fairly easy to understand: If a block isn't in any file, and isn't on the free list, the kernel can't find it. All it takes is someone zeroing out a buf[] pointer without first freeing the block. For inodes, you'd think that it couldn't happen, since the kernel can determine by examination which inodes are free, and they are a simple vector. But in Sys/V, free inodes are also in a linked list, so the kernel is dependent on inodes being freed properly. In this case, it should be simple to write an "inode scavenger" to correct the problem on the fly. But you'd have to put it in the kernel, because the kernel caches critical info (such as recently-allocated inodes, which would appear "unallocated" on the disk because the in-memory copies haven't been flushed). Anyhow, it may or may not be a consolation to know that many Sys/V releases have the same problem. Whether AT&T knows about it, I don't know. Maybe we should tell them that we know.... -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
ray@dsiramd.nz (Ray Brownrigg) (11/08/87)
In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes: >I'm running usenet on a 3b2/400 Sys V r2.0.1 ... > [inodes disappear, need fsck to recover] I have been having exactly the same problem on two different 3b2/400's running System V r3.0. On a third 3B2/400, on which I have restructured the /usr2 file system to contain more inodes, the problem does not appear to occur (or perhaps I have not noticed it because it does not run out of inodes any more). As I recall the problem is not cured by a reboot, because an fsck is not performed unless the system crashed. -- Ray Brownrigg UUCP: {utai!calgary,uunet}!vuwcomp!dsiramd!ray Applied Maths Div, DSIR ACSnet: ray@dsiramd.nz[@munnari] PO Box 1335 System: OLIVETTI/AT&T 3B2/400B+, System V R3.0 Wellington, New Zealand "UNX -rules -OK"
guy@gorodish.Sun.COM (Guy Harris) (11/09/87)
> But in Sys/V, free inodes are also in a linked list, so the > kernel is dependent on inodes being freed properly. In the V7 file system, which is used by S5, free inodes are not in any sort of linked list. There is a cache in the superblock that saves the i-numbers of a small number of free inodes. If this cache is emptied, the system has to make a linear search through the i-list looking for an inode with a mode word of zero. There is an optimization in later versions of this code (including the S5 version) that tries to remember the i-number of the first free inode, so that it doesn't have to search the *entire* i-list. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
jeffl@berick.UUCP (Jeff Lawhorn) (11/09/87)
Posting-Front-End: GNU Emacs 18.41.6 of Sun Oct 4 1987 on berick (usg-unix-v) I don't know what is happening with you 3b's lossing inodes on the news file system. We have a 3b15 that has been running the 2.11 software since the day it came across the net, and we were running 2.10 for quite a while prior to that, and we've never seen the problem of losing inodes untill an fsck is done. Maybe the problem is something paticular to your sites (students :-). At our site the 3b15 goes down once a month so that I can to a root file system back up. The only time we've had a problem with the machine in the last 15 months was when we lost a drive. -- Everything should be made as simple Jeff Lawhorn as possible, but no simpler. ...!sdcsvax!jack!berick!jeffl
df@nud.UUCP (Dale Farnsworth, NO7K) (11/09/87)
In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
->I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
->a strange problem with the news file system (news is on its
...
->Everything went smoothy for a while, until the feed dried up
->temporarily and the news volume dropped, thus emptying out
->the file system somewhat. When the feed returned to full
->volume, I noticed that I was getting "out of inode" errors
->when I knew full well there weren't nearly that many files
->in the system. So I unmounted the news file system, fsck'd
->it, got a "free inode count in superblock (fix?)" message
->back from fsck, told it to fix it, remounted the file system
->and again everything went smoothly until the news volume
->dropped again, and the same thing happened.
I have seen the same thing on my 68020 system running System V R3
code. Somehow, the free inode count goes to 0 though there are
thousands of free inodes. It happens every month or so.
I haven't noticed the correlation with news volume, but there may
be one. I would be very interested in a fix.
-Dale
jfh@killer.UUCP (11/09/87)
In article <33319@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes: > > But in Sys/V, free inodes are also in a linked list, so the > > kernel is dependent on inodes being freed properly. > > In the V7 file system, which is used by S5, free inodes are not in any sort of > linked list. There is a cache in the superblock that saves the i-numbers of a > small number of free inodes. If this cache is emptied, the system has to make > a linear search through the i-list looking for an inode with a mode word of > zero. > Guy Harris Guy - the difference between Version 7 and later versions ( > System III ) is the free inode count is maintained in the superblock. In Version 7 the free inode count, which I seem to remember had an entry in the superblock, was not updated. So, when an I-node was allocated, the kernel had to search the entire I-list for a free inode (assuming the superblock cache was empty) without knowing if an I-node would be found. Now, the kernel `knows' how many free inodes are out there without even looking. - John. -- John F. Haugh II HECI Exploration Co. Inc. UUCP: ...!ihnp4!killer!jfh 11910 Greenville Ave, Suite 600 "Don't Have an Oil Well?" Dallas, TX. 75243 " ... Then Buy One!" (214) 231-0993
showard@uccba.UUCP (Steve Howard) (11/11/87)
I have also had this problem with a 3B2 Sys V Rel 2.1. I found that it would happen semi-regularly on Monday & Thursday Mornings. The cron log (which was usually around 1 MEG (We've got a busy cron :-)) would get /dev/null copied to it every Mon. & Thurs. morning. It appears that if we were uncompressing news and the cronlog got /dev/null cp'd to it at the same time--wham!!! No more inodes!!! I took the section out of the root crontab that messed with the cronlog and everything has lived happily ever after. Is this caused by the cron writing to the end of a large file as it is simultaneously deleting it? Does rnews cause a problem? Is it purely coincidence? Probably, but it hasn't happened on my system since I removed the section of the crontab that deletes the cronlog. -- Steve Howard UUCP: {pyramid,philabs!phri,decuac,mit-eddie}!uccba!showard U.C. College of Business Administration USPS: M.L. 130, Cincinnati, OH 45221
jc@minya.UUCP (John Chambers) (11/11/87)
In article <33319@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes: > > But in Sys/V, free inodes are also in a linked list, so the > > kernel is dependent on inodes being freed properly. > > In the V7 file system, which is used by S5, free inodes are not in any sort of > linked list. There is a cache in the superblock that saves the i-numbers of a > small number of free inodes. Gee, when I look in /usr/include/sys/inode.h on this 5.2 system, I see: struct inode { struct inode *i_forw; /* hash chain forw */ struct inode *i_back; /* hash chain back */ char i_flag; cnt_t i_count; /* reference count */ dev_t i_dev; /* device where inode resides */ ino_t i_number; /* i number, 1-to-1 with device address */ ... It sure looks like someone is doing linked lists from a hash table. This would explain how things could get lost, and why fsck would find them. It would also explain why fask makes comments about inode lists. Or am I misinterpreting something? -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)
sewilco@datapg.DataPg.MN.ORG (Scot E. Wilcoxon) (11/12/87)
Some people have noticed a connection between the inode problem and
rnews running around the time when cronlog is cleared.
There is one thing which is unusual about rnews and cronlog: rnews
can generate a lot of error messages to stderr, which can end up in
cronlog. Most programs generate short cronlog messages, while rnews
is likely to run for several minutes and can easily generate several K
of unbuffered error messages.
If cronlog is unlinked while rnews is running, those error messages
will continue being placed in the phantom file. If there's a bug,
it may be with this combination of long [unbuffered] output to a
phantom file.
Testing will have to be done by someone with a system on which they
don't mind blowing away the inodes.
The workaround is to throw away the error messages by putting in the
rnews crontab entry
>/dev/null 2>&1
The error messages will be placed in LIBDIR/errlog, if it exists.
--
Scot E. Wilcoxon sewilco@DataPg.MN.ORG {ems,meccts}!datapg!sewilco
Data Progress Minneapolis, MN, USA +1 612-825-2607
metro@asi.UUCP (Metro T. Sauper) (11/12/87)
In article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) writes: > ..... > volume, I noticed that I was getting "out of inode" errors > when I knew full well there weren't nearly that many files > in the system. This is happening at my system also. Please help! -- Metro T. Sauper, Jr. Assessment Systems, Inc. Director, Remote Systems Development 210 South Fourth Street (215) 592-8900 ..!asi!metro Philadelphia, PA 19106
pgf@mtung.UUCP (11/13/87)
Wow! I haven't seen the net in this much agreement on a single subject in years! It sounds like there might really be a bug in the SysV filesystem! Even rec.bicycles doesn't get along this well! :-) -- Paul Fox, AT&T Information Systems, Middletown NJ. [ihnp4|vax135]!mtung!pgf (201)957-2698
hanko@edge.UUCP (Jim Hanko) (11/13/87)
In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: >> >> I have seen the same thing on an ICM3216 running SysV.2.2. >> The inode count of the spool file system (where news reside) >> will drop from 12000 to 0 within minutes (perhaps seconds) while >> unpacking a compressed news batch (rnews -U). Recourse is to go >> to single user mode and do an fsck on the file system. This will >> restore all (12000) lost inodes. This scenario happens about >> once per month and I have not noticed a correlation with the >> news volume. > >I've encountered this problem on a couple of Dual Systems orphaned >machines. I always figured it was Unisofts or Duals fault. > I ran into the same probem on our news file system and tracked it down to a generic System V bug in the ialloc() module. The problem occurs because ialloc() scans from the last inode allocated to the end of the inode table looking for free inodes. If none are found (e.g. if the last allocated inode was near the end of the table and all subsequent ones are in use), then "out of inodes" is reported. It DOES NOT go back to search for free inodes from the beginning. Therefore, this error can occur even when many free inodes are available. The fix involves checking whether the search began at inode 0 when no free inodes were found. If it didn't, then re-start the search at 0. If it did, THEN print "out of inodes" and exit. This problem rarely shows up on "normal" file systems, but the high level of activity in net file systems seems to aggrivate it. --- Jim Hanko ...{mot|ism780|oliveb}!edge!hanko Edge Computer, Scottsdale AZ --
allbery@ncoast.UUCP (Brandon Allbery) (11/16/87)
As quoted from <359@minya.UUCP> by jc@minya.UUCP (John Chambers): +--------------- | Gee, when I look in /usr/include/sys/inode.h on this 5.2 system, I see: | | struct inode | { | struct inode *i_forw; /* hash chain forw */ | struct inode *i_back; /* hash chain back */ | char i_flag; | | It sure looks like someone is doing linked lists from a hash table. This | would explain how things could get lost, and why fsck would find them. It | would also explain why fask makes comments about inode lists. +--------------- Sorry. This is the in-memory copy of an inode; the linked list in question is a linked list of inodes currently in memory. Why would they be in memory? For speed. Why do they need speed? Because they represent: * root directories of filesystems * "chroot" root directories * current directories * mount points * open files * saved-text and/or demand-paged executables currently in use all of which are referenced constantly. -- Brandon S. Allbery necntc!ncoast!allbery@harvard.harvard.edu {hoptoad,harvard!necntc,{sun,cbosgd}!mandrill!hal,uunet!hnsurg3}!ncoast!allbery Moderator of comp.sources.misc
richard@islenet.UUCP (Richard Foulk) (11/18/87)
In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes: > [...] > The fix involves checking whether the search began at inode 0 when no free > inodes were found. If it didn't, then re-start the search at 0. If it > did, THEN print "out of inodes" and exit. > > This problem rarely shows up on "normal" file systems, but the high level > of activity in net file systems seems to aggrivate it. Great! Looks like we may be closing in on a solution here. Maybe. So the question is: do you have diffs for the fix? Since quite a number of people with a few different versions of System V have reported encountering this problem, it seems that the ialloc routine you mentioned probably hasn't changed across versions of unix. Does that seem like a reasonable assumption? Since only rnews seems to provoke this bug is there some sort of way to avoid the bug that comes to mind? I've just finished running fsck on my news file system for about the 8th or 10th time today -- all for one days batch of news. Any insights into a solution or work-around are appreciated. Thanks. -- Richard Foulk ...{dual,vortex,ihnp4}!islenet!richard Honolulu, Hawaii
henry@utzoo.UUCP (Henry Spencer) (11/18/87)
> ...the difference between Version 7 and later versions ( > System III ) > is the free inode count is maintained in the superblock... > ... Now, the kernel `knows' how > many free inodes are out there without even looking. Given that it doesn't know *where* they are, how is this useful? The only situation in which knowing the count is significant is when a filesystem is out of inodes, or so close to it that the search for more can be cut short by the count. Now the $64 question: how frequent is this? Not very, in my experience. In fact, I'm not sure I've ever seen it. I question the value of an "optimization" for such a rare case that introduces bugs in more common situations, which is obviously the case here. P.S. My kernel maintains the inode count, but only for human contemplation; the kernel itself pays no attention to it. P.P.S. Whatever the bug is, it must be something that AT&T added since V7, since my system never loses inodes. -- Those who do not understand Unix are | Henry Spencer @ U of Toronto Zoology condemned to reinvent it, poorly. | {allegra,ihnp4,decvax,utai}!utzoo!henry
slb@boole.acc.virginia.edu (sandy) (11/19/87)
In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes: >In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: ( description of problem of appearing to run out of inodes when there are actually many free inodes. ) >I ran into the same probem on our news file system and tracked it down >to a generic System V bug in the ialloc() module. The problem occurs >because ialloc() scans from the last inode allocated to the end of the >inode table looking for free inodes. If none are found (e.g. if the >last allocated inode was near the end of the table and all subsequent >ones are in use), then "out of inodes" is reported. It DOES NOT go >back to search for free inodes from the beginning. Therefore, this >error can occur even when many free inodes are available. > >The fix involves checking whether the search began at inode 0 when no free >inodes were found. If it didn't, then re-start the search at 0. If it >did, THEN print "out of inodes" and exit. > >This problem rarely shows up on "normal" file systems, but the high level >of activity in net file systems seems to aggrivate it. This sounds good - we have this problem frequently, so I checked out the source that we have (SVR2 and SVR3). It's true that whenever you have to replentish the free inode cache, you search through the inode list starting wherever you left off last time. And it's also true that you only search to the bottom of the list - you don't wrap around. But there is also code there to reset the starting point to zero whenever you managed to find a few free inodes, but not enough to totally fill the cache. So, it seems to me that the problem could only arise when there were exactly enough inodes left between the starting point and the end to fill the cache (any less and you'd reset the starting point, any more and there'd be one to find on the next search) the last time around. But it doesn't seem as though this could arise often enough to account for how often I see it. What am I missing? (if it's obvious, please be kind ...) And another thing - how come an fsck fixes it? I can see how it resets the free inode count so you no longer think you're out of inodes, but it doesn't seem to reset the starting point for the search (at least a brief search through fsck.c turns up no obvious references to that field). Wouldn't you just have the same problem the next time you alloc'ed an inode? Does mount reset this? -- sandy bryant slb@virginia.edu uunet!virginia!slb
slb@boole.acc.virginia.edu (sandy) (11/19/87)
In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes: >And another thing - how come an fsck fixes it? I can see how it >resets the free inode count so you no longer think you're out of >inodes, but it doesn't seem to reset the starting point for the >search (at least a brief search through fsck.c turns up no obvious >references to that field). Wouldn't you just have the same problem >the next time you alloc'ed an inode? Does mount reset this? I just looked and mount does reset it. That means that if you can't fix the source, you might be able to ward off the evil spirits by just umnounting and remounting the file system (i.e. you don't have to reboot or even fsck). -- sandy bryant slb@virginia.edu uunet!virginia!slb
mmengel@cuuxb.ATT.COM (Marc W. Mengel) (11/20/87)
- In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes: $In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes: $>In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: ... $>I ran into the same probem on our news file system and tracked it down $>to a generic System V bug in the ialloc() module. The problem occurs $>because ialloc() scans from the last inode allocated to the end of the $>inode table looking for free inodes. If none are found (e.g. if the $>last allocated inode was near the end of the table and all subsequent $>then "out of inodes" gets printed ... $This sounds good - we have this problem frequently, so I checked out $the source that we have (SVR2 and SVR3). It's true that whenever you $have to replentish the free inode cache, you search through the inode $list starting wherever you left off last time. And it's also true that $you only search to the bottom of the list - you don't wrap around. But $there is also code there to reset the starting point to zero whenever $you managed to find a few free inodes, but not enough to totally fill $the cache. So, it seems to me that the problem could only arise when $there were exactly enough inodes left between the starting point and the $end to fill the cache (any less and you'd reset the starting point, any $more and there'd be one to find on the next search) the last time around. $But it doesn't seem as though this could arise often enough to account $for how often I see it. What am I missing? (if it's obvious, please $be kind ...) $ $And another thing - how come an fsck fixes it? ... fsck does *NOT* fix it, it re-fills your free inode cache and does not affect that pointer. This means that your search pointer points to the end of the inode table, and every time your inode cache runs out, you get the "out of space" error. Fsck re-fills the inode cache, but the search pointer never moves from the end of the inode table... So you see, once you hit the condition of cache-size inodes being after the search pointer, you get a condition where you falsely run out of inodes $-- $sandy bryant $slb@virginia.edu $uunet!virginia!slb -- Marc Mengel attmail!mmengel ...!{moss|lll-crg|mtune|ihnp4}!cuuxb!mmengel
hanko@edge.UUCP (Jim Hanko) (11/24/87)
I must apologize for not responding sooner, but I was put out of action for a week due to an auto accident. Anyway (some history): In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: > ( description of problem of appearing to run out of inodes when there > are actually many free inodes. ) In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) [that's me] writes: > (description of fix) In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes: > .... So, it seems to me that the problem could only arise when >there were exactly enough inodes left between the starting point and the >end to fill the cache (any less and you'd reset the starting point, any >more and there'd be one to find on the next search) the last time around. >But it doesn't seem as though this could arise often enough to account >for how often I see it. What am I missing? (if it's obvious, please >be kind ...) It has been almost a year since I fixed this bug, so I had forgotten some of the details of the problem. It is true that this will only occur when exactly NICINOD free inodes were left from the starting point of the last successful search. This seems to occur more often than you might expect on a file system that is very active (e.g. net news). I believe it happens when you start using inodes near the end of the inode table. For example, if a file is deleted and its inode is approximately NICINOD from the end, and if this is the last inode re-allocated from the cache, the next group of inodes found will be near the end. If many files are then created and deleted in rapid fire, chances are good that the situation will show up. In article <3646@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: >So the question is: do you have diffs for the fix? I am a little uncomfortable about posting diffs, so I will try to do so without giving too much detail. === in module ialloc() === < fp->s_ninode = NICINOD; < ino = ... < for(adr = ... { < . < . <-- major loop which searches for free inodes < . < } --- > >again: /* come back here if necessary to re-search from beginning */ > > fp->s_ninode = NICINOD; > ino = ... > for(adr = ... { > . > . <-- major loop which searches for free inodes > . > } > /* > * If we didn't find any and we didn't start at the beginning, > * look again starting at the beginning > */ > if (fp->s_ninode == NICINOD && fp->s_inode[0] != 0) { > fp->s_inode[0] = 0; > goto again; > } --- I don't generally like using 'goto's, but it seemed the least intrusive way to fix the problem (please, no flames). In article <3646@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes: >Since only rnews seems to provoke this bug is there some sort of >way to avoid the bug that comes to mind? The only thing I can suggest is to have so many free inodes that you rarely go near the end of the table. I'm not sure how much that will help, though. --- Jim Hanko ...{mot|ism780|oliveb}!edge!hanko Edge Computer, Scottsdale AZ --
hanko@edge.UUCP (Jim Hanko) (11/24/87)
In article <993@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes: >In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes: >> .... So, it seems to me that the problem could only arise when >>there were exactly enough inodes left between the starting point and the >>end to fill the cache (any less and you'd reset the starting point, any >>more and there'd be one to find on the next search) the last time around. >>But it doesn't seem as though this could arise often enough to account >>for how often I see it. What am I missing? ... > > ( vigorous hand waving in lieu of explanation ) After further reflection, I believe I can explain more clearly. 1) The relevant features of the ialloc() algorithm are: - A cache of NICINOD (100) free inodes is maintained. - When the cache becomes empty, a linear scan through the inode table is performed to find free inodes. - When the cache becomes full (due to file deletions), a new (relatively random) scan point is established, based on the last freed inode. - The bug occurs when the scan takes exactly the last NICINOD free inodes in the table. 2) On a 'normal' file system, file creations and deletions are not well correlated, and occur with approximately equal frequency. Therefore, the cache usually contains sufficient free inodes, and the scan is rarely necessary. When it is, typically one short scan from a random point usually takes care of it. Therefore, the bug almost never shows up in this type of file system. 3) A 'net news' file system experiences repeated episodes where many new files are created at once (new articles arrive), intermixed with episodes where many files are deleted at once (articles expire, packed news files deleted). If enough new files are created at once, relative to the number of free inodes, the likelyhood becomes high that a scan from any random point in the table will reach the end. Normally, it will then resume at the beginning. However, on average, 1% of the time (i.e. 1/NICINODE scans) the bug will strike instead. If it doesn't, the periods of file deletion will establish a new (somewhat) random starting point, creating a new opportunity for the bug to appear during the next file creation binge. Therefore, If you don't have a source license and can't install the fix I posted in an earlier article, I can only suggest that you keep a large number of free inodes in your 'net news' file system. This will reduce, but not eliminate, the probability of the bug affecting you. --- Jim Hanko ...{mot|ism780|oliveb}!edge!hanko Edge Computer, Scottsdale AZ --
jc@minya.UUCP (11/26/87)
While we're working on the out-of-inodes problem, perhaps it should be pointed out that there is also a similar out-of-blocks problem. On this and quite a lot of other SysV machines, I've been able to cause a file system to run out of free blocks, in such a way that running fsck finds a whole lot (typically in the thousands). I don't have any good evidence of what causes it, other than that it appears to be similar to the inode problem: Running a test program that rapidly creates and unlinks small files will often produce the problem, especially if you run N copies of the program in parallel. I'd work on it here, but I don't have the source on this machine. [Well, it's a good excuse to be lazy! :-] -- John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)