[comp.sys.att] Wierd 3b inode problem with news.

demasi@paisano.UUCP (Michael C. De Masi) (11/04/87)

Hello people,

I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
a strange problem with the news file system (news is on its
own file system to prevent it from interfering with other
data)  Soon after I first installed news, I found that I
had run out of inodes long before data blocks.  So, I 
backed up the news file system, unmounted it,
did a mkfs on the disk partition with the exact same
size as the origional only with 1600 more inodes, then
remounted the file system and restored the news data.

Everything went smoothy for a while, until the feed dried up
temporarily and the news volume dropped, thus emptying out
the file system somewhat.  When the feed returned to full
volume, I noticed that I was getting "out of inode" errors
when I knew full well there weren't nearly that many files
in the system.  So I unmounted the news file system, fsck'd
it, got a "free inode count in superblock (fix?)" message
back from fsck, told it to fix it, remounted the file system
and again everything went smoothly until the news volume
dropped again, and the same thing happened.

What I'm starting to wonder is whether or not I have accidentally
created some sort of stick point in the free inode list that
can only be gotten around with an fsck?  Because of recent
fluctuations with my news feed, it has become a real hassle
to constantly have to fix the file system, so I was wondering
if anyone out there had ever had a problem like this or a
possible solution?  Is it something I did or some strange
interaction between news & Sys V?  Any ideas?

Awaiting your replies,
-- 
Michael C. De Masi - AT&T Communications (For whom I work and not speak)
3702 Pender Drive, Fairfax, Virginia 22030   Phone: 703-246-9555
UUCP:   seismo!decuac!grebyn!paisano!demasi
     "There are monkey boys on the premises."  Unknown red Lectroid.

slb@boole.acc.virginia.edu (sandy) (11/05/87)

We have had the same problem (frequently running out of inodes on
a  file system when you know darn well there should be inodes left.
Fsck always complains about bad free inode count.)  with the file
system that hold our news and mail and print queues.  We have a few
3B15's, a few 3B5's and lots of 3B2's.  We don't run news on the
3B2's, and I don't think I have ever observed the problem there, but
all the other machines exhibit this behavior.  I don't see how 
remaking the partition with more inodes can cause this - rather I
think that there is some bug in the file system code and the high
activity that you see in a spool type partition somehow causes a
race condition to happen.  And boy, is it ever a drag - especially
when it happens to a machine that you count on to feed ten other 
machines.  You didn't say what release you run - we run 3.0 on the 
2's, 2.0 on the 5's and 2.1 on the 15's.  
-- 
sandy bryant
slb@virginia.edu
uunet!virginia!slb

brian@sdcsvax.UCSD.EDU (Brian Kantor) (11/05/87)

In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
>
>I'm running usenet on a 3b2/400 Sys V r2.0.1 and ...
>... I noticed that I was getting "out of inode" errors
>when I knew full well there weren't nearly that many files
>in the system.  So I unmounted the news file system, fsck'd
>it, got a "free inode count in superblock (fix?)" message
>back from fsck, told it to fix it, remounted the file system
>and again everything went smoothly until the news volume
>dropped again, and the same thing happened.

I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the
news filesystem running out of inodes - a reboot always fixes it.  It
is almost as though the inodes weren't being properly freed after the
news is expired.  I don't think it's hung processes hanging on to the
inode after the directory entry is removed, since I often don't find any
news processes running.

I find it interesting that the reboot (without an fsck, since the
filesystem was marked as clean when the system went down) fixes the
problem.  If it were truely a buggered superblock, one would not expect
that.

Since news is the one system we run on this machine which creates
multiple links to files, I suspect it might be related to that.

My workaround is to simply schedule a reboot each Monday morning at 3 am.  
That way the system is fresh and clean when I get to work that
week.  Admittedly, that's fixing the symptom and not the problem.

	Brian Kantor    UCSD Office of Academic Computing
			Academic Network Operations Group UCSD B-028,
			La Jolla, CA 92093 USA

sverre@fesk.UUCP (Sverre Froyen) (11/06/87)

in article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) says:
> I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
> a strange problem with the news file system ....

(text deleted)

>                       ...  When the feed returned to full
> volume, I noticed that I was getting "out of inode" errors
> when I knew full well there weren't nearly that many files
> in the system.  So I unmounted the news file system, fsck'd
> it, got a "free inode count in superblock (fix?)" message
> back from fsck, told it to fix it, remounted the file system
> and again everything went smoothly until the news volume
> dropped again, and the same thing happened.

I have seen the same thing on an ICM3216 running SysV.2.2.
The inode count of the spool file system (where news reside)
will drop from 12000 to 0 within minutes (perhaps seconds) while
unpacking a compressed news batch (rnews -U). Recourse is to go
to single user mode and do an fsck on the file system. This will
restore all (12000) lost inodes.  This scenario happens about
once per month and I have not noticed a correlation with the
news volume.
-- 
Sverre Froyen
UUCP:   boulder!fesk!sverre, sunpeaks!seri!fesk!sverre
ARPA:   froyen@nmfecc.arpa
BITNET: froyen@csugold.bitnet

news@jpusa1.UUCP (usenet) (11/06/87)

Summary:

Expires:


In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
-I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
-a strange problem with the news file system (news is on its
-own file system to prevent it from interfering with other
-data)  Soon after I first installed news, I found that I
-had run out of inodes long before data blocks.

I've had similar behaviour when the disk partition fills up and you run out
of data blocks.  For some reason, when the data blocks become available again,
the inodes don't get returned to the freelist.  This is on a unisoft sys5 r0
box.  The cure, when it happens, is to fsck the disk.  Is this a generic bug in
sys5?  Anyway, try to avoid filling the partition and the problem will most
likely disappear.  I've hacked an rnews that checks for space on the disk
before spooling the incoming article.  It knows of a list of alternate
directories on other partitions to use when it gets dangerously low.
--
Stu Heiss {gargoyle,ihnp4}!jpusa1!stu

heiby@mcdchg.UUCP (Ron Heiby) (11/07/87)

I have seen the same thing twice in about 10 months of use of my
MC68020-based system running SVR3.  An fsck fixes the problem.
I think it's wierd.
-- 
Ron Heiby, heiby@mcdchg.UUCP	Moderator: comp.newprod & comp.unix
"I know engineers.  They love to change things."  McCoy

larry@kitty.UUCP (Larry Lippman) (11/07/87)

In article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) writes:
> What I'm starting to wonder is whether or not I have accidentally
> created some sort of stick point in the free inode list that
> can only be gotten around with an fsck?  Because of recent
> fluctuations with my news feed, it has become a real hassle
> to constantly have to fix the file system, so I was wondering
> if anyone out there had ever had a problem like this or a
> possible solution?  Is it something I did or some strange
> interaction between news & Sys V?  Any ideas?

	If it's any consolation, I have the same problem on `kitty', which
is also a 3B2.  The problem indentical to yours occurs about once every
4 to 5 months.  I just reboot and fsck.  It is so infrequent, that I just
haven't felt like tracking it down.

<>  Larry Lippman @ Recognition Research Corp., Clarence, New York
<>  UUCP:  {allegra|ames|boulder|decvax|rutgers|watmath}!sunybcs!kitty!larry
<>  VOICE: 716/688-1231       {hplabs|ihnp4|mtune|seismo|utzoo}!/
<>  FAX:   716/741-9635 {G1,G2,G3 modes}   "Have you hugged your cat today?" 

djt@hotps.ATT.COM (Dave Trulli) (11/07/87)

I too have been seeing my /usr/spool file system running out of
inodes when news in coming in. An fsck will also fix the free
inode count. I am using a 3B15 2.1.1 and have heard of it happening
on a 3B2 and a 3B20 too. The problem occurs about once a week here.
I dont know a way news could do this so is it a bug in news or
a bug the file system code ???

-- 
UUCP:	ihnp4!hotps!djt			Dave Trulli  NN2Z
	djt@hotps.ATT.COM		AT&T Network Systems
PACKET:	nn2z@nn2z			Holmdel NJ.
					201-949-4774

rbl@nitrex.UUCP ( Dr. Robin Lake ) (11/08/87)

In article <4259@sdcsvax.UCSD.EDU> brian@sdcsvax.UCSD.EDU (Brian Kantor) writes:
>In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
>>
>>I'm running usenet on a 3b2/400 Sys V r2.0.1 and ...
>>... I noticed that I was getting "out of inode" errors
>>when I knew full well there weren't nearly that many files
>>in the system.  So I unmounted the news file system, fsck'd
>>  ,,,
>
>I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the
>news filesystem running out of inodes - a reboot always fixes it.  It
> ...

A similar "thing" happens on our Motorola 6600 (aka Convergent MegaFrame)
running SV.  Now and then a news directory "locks up" and rnews can't put
anything into it.  We're running 2.10 news and were hoping to "cure" the
problem with 2.11 soon.

-- 
Rob Lake
{decvax,ihnp4!cbosgd}!mandrill!nitrex!rbl

richard@islenet.UUCP (Richard Foulk) (11/08/87)

> 
> I have seen the same thing on an ICM3216 running SysV.2.2.
> The inode count of the spool file system (where news reside)
> will drop from 12000 to 0 within minutes (perhaps seconds) while
> unpacking a compressed news batch (rnews -U). Recourse is to go
> to single user mode and do an fsck on the file system. This will
> restore all (12000) lost inodes.  This scenario happens about
> once per month and I have not noticed a correlation with the
> news volume.

I've encountered this problem on a couple of Dual Systems orphaned
machines.  I always figured it was Unisofts or Duals fault.

It seems to be dependent on the ratio of free blocks to free inodes
or something like that.  Whenever the problem comes back I often
have to do the umount/fsck/mount/unbatch cycle several times before
it will settle down and stop running out of inodes.  Then the problem
will often stay away for weeks or months.

I vaguely remember hearing something about some race condition in
the kernel allowing this to happen, but I thought that had been
fixed long ago.


-- 
Richard Foulk		...{dual,vortex,ihnp4}!islenet!richard
Honolulu, Hawaii

jc@minya.UUCP (John Chambers) (11/08/87)

In article <4259@sdcsvax.UCSD.EDU>, brian@sdcsvax.UCSD.EDU (Brian Kantor) writes:
> In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
> >
> >I'm running usenet on a 3b2/400 Sys V r2.0.1 and ...
> >... I noticed that I was getting "out of inode" errors
> >when I knew full well there weren't nearly that many files
> >in the system.  So I unmounted the news file system, fsck'd
> >it, got a "free inode count in superblock (fix?)" message
> >back from fsck, told it to fix it, remounted the file system
> >and again everything went smoothly until the news volume
> >dropped again, and the same thing happened.
> 
> I've noticed similar behaviour here on our 3B15 (SysV 2.1.2) with the
> news filesystem running out of inodes - a reboot always fixes it.

You folks are just discovering a common (possibly universal) Sys/V bug.
I've been able to produce this behavior on numerous machines that were 
clearly running different ports of Sys/V.  It seems to have little to
do with exactly what the software was doing.  The kernel just loses track
of inodes (and also blocks).  The situation with blocks is fairly easy
to understand:  If a block isn't in any file, and isn't on the free
list, the kernel can't find it.  All it takes is someone zeroing out
a buf[] pointer without first freeing the block.

For inodes, you'd think that it couldn't happen, since the kernel can
determine by examination which inodes are free, and they are a simple
vector.  But in Sys/V, free inodes are also in a linked list, so the
kernel is dependent on inodes being freed properly.  In this case, it
should be simple to write an "inode scavenger" to correct the problem
on the fly.  But you'd have to put it in the kernel, because the kernel
caches critical info (such as recently-allocated inodes, which would
appear "unallocated" on the disk because the in-memory copies haven't
been flushed).

Anyhow, it may or may not be a consolation to know that many Sys/V
releases have the same problem.  Whether AT&T knows about it, I don't
know.  Maybe we should tell them that we know....

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

ray@dsiramd.nz (Ray Brownrigg) (11/08/87)

In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
>I'm running usenet on a 3b2/400 Sys V r2.0.1 ...
> [inodes disappear, need fsck to recover]

I have been having exactly the same problem on two different 3b2/400's running
System V r3.0. On a third 3B2/400, on which I have restructured the /usr2 file
system to contain more inodes, the problem does not appear to occur (or
perhaps I have not noticed it because it does not run out of inodes any more).
As I recall the problem is not cured by a reboot, because an fsck is not
performed unless the system crashed.


-- 
Ray Brownrigg		UUCP: {utai!calgary,uunet}!vuwcomp!dsiramd!ray
Applied Maths Div, DSIR		ACSnet:	ray@dsiramd.nz[@munnari]
PO Box 1335			System:	OLIVETTI/AT&T 3B2/400B+, System V R3.0
Wellington, New Zealand			"UNX -rules -OK"

guy@gorodish.Sun.COM (Guy Harris) (11/09/87)

> But in Sys/V, free inodes are also in a linked list, so the
> kernel is dependent on inodes being freed properly.

In the V7 file system, which is used by S5, free inodes are not in any sort of
linked list.  There is a cache in the superblock that saves the i-numbers of a
small number of free inodes.  If this cache is emptied, the system has to make
a linear search through the i-list looking for an inode with a mode word of
zero.  There is an optimization in later versions of this code (including the
S5 version) that tries to remember the i-number of the first free inode, so
that it doesn't have to search the *entire* i-list.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

jeffl@berick.UUCP (Jeff Lawhorn) (11/09/87)

Posting-Front-End: GNU Emacs 18.41.6 of Sun Oct  4 1987 on berick (usg-unix-v)


I don't know what is happening with you 3b's lossing inodes on the
news file system.  We have a 3b15 that has been running the 2.11
software since the day it came across the net, and we were running
2.10 for quite a while prior to that, and we've never seen the problem
of losing inodes untill an fsck is done.

Maybe the problem is something paticular to your sites (students :-).
At our site the 3b15 goes down once a month so that I can to a root
file system back up.  The only time we've had a problem with the
machine in the last 15 months was when we lost a drive.
-- 

Everything should be made as simple      Jeff Lawhorn
  as possible, but no simpler.           ...!sdcsvax!jack!berick!jeffl

df@nud.UUCP (Dale Farnsworth, NO7K) (11/09/87)

In article <283@paisano.UUCP> demasi@paisano.UUCP (Michael C. De Masi) writes:
->I'm running usenet on a 3b2/400 Sys V r2.0.1 and I'm having
->a strange problem with the news file system (news is on its
...
->Everything went smoothy for a while, until the feed dried up
->temporarily and the news volume dropped, thus emptying out
->the file system somewhat.  When the feed returned to full
->volume, I noticed that I was getting "out of inode" errors
->when I knew full well there weren't nearly that many files
->in the system.  So I unmounted the news file system, fsck'd
->it, got a "free inode count in superblock (fix?)" message
->back from fsck, told it to fix it, remounted the file system
->and again everything went smoothly until the news volume
->dropped again, and the same thing happened.

I have seen the same thing on my 68020 system running System V R3
code.  Somehow, the free inode count goes to 0 though there are
thousands of free inodes.  It happens every month or so.
I haven't noticed the correlation with news volume, but there may
be one.  I would be very interested in a fix.

-Dale

jfh@killer.UUCP (11/09/87)

In article <33319@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
> > But in Sys/V, free inodes are also in a linked list, so the
> > kernel is dependent on inodes being freed properly.
> 
> In the V7 file system, which is used by S5, free inodes are not in any sort of
> linked list.  There is a cache in the superblock that saves the i-numbers of a
> small number of free inodes.  If this cache is emptied, the system has to make
> a linear search through the i-list looking for an inode with a mode word of
> zero.

> 	Guy Harris

Guy - the difference between Version 7 and later versions ( > System III )
is the free inode count is maintained in the superblock.  In Version 7 the
free inode count, which I seem to remember had an entry in the superblock,
was not updated.  So, when an I-node was allocated, the kernel had to search
the entire I-list for a free inode (assuming the superblock cache was empty)
without knowing if an I-node would be found.  Now, the kernel `knows' how
many free inodes are out there without even looking.

- John.
-- 
John F. Haugh II		HECI Exploration Co. Inc.
UUCP:	...!ihnp4!killer!jfh	11910 Greenville Ave, Suite 600
"Don't Have an Oil Well?"	Dallas, TX. 75243
" ... Then Buy One!"		(214) 231-0993

scl@virginia.acc.virginia.edu (Steve Losen) (11/10/87)

In article <156@fesk.UUCP> sverre@fesk.UUCP (Sverre Froyen) writes:
>
>I have seen the same thing on an ICM3216 running SysV.2.2.
>The inode count of the spool file system (where news reside)
>will drop from 12000 to 0 within minutes (perhaps seconds) while
>unpacking a compressed news batch (rnews -U). Recourse is to go
...
>-- 
>Sverre Froyen
>UUCP:   boulder!fesk!sverre, sunpeaks!seri!fesk!sverre
>ARPA:   froyen@nmfecc.arpa
>BITNET: froyen@csugold.bitnet

We have 10 3b15's running news and usually at least one of them has
a hosed /usr/spool.  We fix the problem by running fuser -k and then
unmounting /usr/spool to fsck it.  This rarely kills off user processes
but always blows away cron and lpsched.  No problem, we just restart
them from /etc/rc.d.

The fsck recovers from 12,000 to 13,000 free inodes.  This looks suspiciously
like the kernel is stuffing the free inode count into a short int or
passing it to a routine expecting a short, or using 16-bit arithmetic operators.
If the inode count > 2**15, putting it in a short makes it negative!
Has anyone found such a bug in the kernel?  Could it be a more subtle problem?
I once used a FORTRAN compiler that stored small integer constants in shorts
and used 16-bit arithmetic whenever all the operands were short.  Needless to
say, I was quite surprised when the statement

i = 10 * 4000

assigned a negative number to i (where i was declared INTEGER*4, i.e., long).

At any rate, my first guess is that this bug is caused by an erroneous use
of 16-bit operators on a 32-bit quantity.  The problem could be in the C code
or in the compiler.
-- 
Steve Losen
University of Virginia Academic Computing Center

sverre@fesk.UUCP (Sverre Froyen) (11/10/87)

The discussion is on disappearing inodes from the news file system.
The inodes mysteriously disappears and can only be restored by doing
an fsck.

in article <548@jpusa1.UUCP>, news@jpusa1.UUCP (usenet) says:
> I've had similar behaviour when the disk partition fills up and you run out
> of data blocks.  For some reason, when the data blocks become available again,
> the inodes don't get returned to the freelist.  This is on a unisoft sys5 r0
> box.  The cure, when it happens, is to fsck the disk.  Is this a generic bug in
> sys5?....

On this machine: ICM3216 sysV.2.2 news_2.11.8 (I have not seen it
yet under patchlevel 11 or 12, but those have only been running for
a couple of weeks) the file system is not anywhere close to being full
when the inodes vanish (at least 10Mb to spare). Thus the bug is not
related to disk blocks being unavailable. This is verified with `df'
which reports, say, 15Mb free space and 0 free inodes.

Does anybody really know what causes this? Does it have to be a kernel
bug (which would be my guess) or could it be a fault in the news software.

-- 
Sverre Froyen
UUCP:   boulder!fesk!sverre, sunpeaks!seri!fesk!sverre
ARPA:   froyen@nmfecc.arpa
BITNET: froyen@csugold.bitnet

showard@uccba.UUCP (Steve Howard) (11/11/87)

I have also had this problem with a 3B2 Sys V Rel 2.1.  I found that it would
happen semi-regularly on Monday & Thursday Mornings.  The cron log (which was
usually around 1 MEG (We've got a busy cron :-)) would get /dev/null copied to
it every Mon. & Thurs. morning.  It appears that if we were uncompressing news
and the cronlog got /dev/null cp'd to it at the same time--wham!!!  No more 
inodes!!!  I took the section out of the root crontab that messed with the     
cronlog and everything has lived happily ever after.

Is this caused by the cron writing to the end of a large file as it is
simultaneously deleting it?  Does rnews cause a problem?  Is it purely
coincidence?  Probably, but it hasn't happened on my system since I removed
the section of the crontab that deletes the cronlog.
-- 
Steve Howard      UUCP:  {pyramid,philabs!phri,decuac,mit-eddie}!uccba!showard
U.C. College of Business Administration  USPS:  M.L. 130, Cincinnati, OH 45221

jc@minya.UUCP (John Chambers) (11/11/87)

In article <33319@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
> > But in Sys/V, free inodes are also in a linked list, so the
> > kernel is dependent on inodes being freed properly.
> 
> In the V7 file system, which is used by S5, free inodes are not in any sort of
> linked list.  There is a cache in the superblock that saves the i-numbers of a
> small number of free inodes.  

Gee, when I look in /usr/include/sys/inode.h on this 5.2 system, I see:

	struct	inode
	{
		struct inode *i_forw;	/* hash chain forw */
		struct inode *i_back;	/* hash chain back */
		char	i_flag;
		cnt_t	i_count;	/* reference count */
		dev_t	i_dev;		/* device where inode resides */
		ino_t	i_number;	/* i number, 1-to-1 with device address */
	...

It sure looks like someone is doing linked lists from a hash table.  This
would explain how things could get lost, and why fsck would find them.  It
would also explain why fask makes comments about inode lists.

Or am I misinterpreting something?

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)

sewilco@datapg.DataPg.MN.ORG (Scot E. Wilcoxon) (11/12/87)

Some people have noticed a connection between the inode problem and
rnews running around the time when cronlog is cleared.

There is one thing which is unusual about rnews and cronlog:  rnews
can generate a lot of error messages to stderr, which can end up in
cronlog.  Most programs generate short cronlog messages, while rnews
is likely to run for several minutes and can easily generate several K
of unbuffered error messages.

If cronlog is unlinked while rnews is running, those error messages
will continue being placed in the phantom file.  If there's a bug,
it may be with this combination of long [unbuffered] output to a
phantom file.  

Testing will have to be done by someone with a system on which they
don't mind blowing away the inodes.

The workaround is to throw away the error messages by putting in the
rnews crontab entry
	>/dev/null 2>&1

The error messages will be placed in LIBDIR/errlog, if it exists.
-- 

Scot E. Wilcoxon	sewilco@DataPg.MN.ORG	{ems,meccts}!datapg!sewilco
Data Progress		Minneapolis, MN, USA	+1 612-825-2607

metro@asi.UUCP (Metro T. Sauper) (11/12/87)

In article <283@paisano.UUCP>, demasi@paisano.UUCP (Michael C. De Masi) writes:
> .....
> volume, I noticed that I was getting "out of inode" errors
> when I knew full well there weren't nearly that many files
> in the system.

This is happening at my system also.  Please help!

-- 
Metro T. Sauper, Jr.                              Assessment Systems, Inc.
Director, Remote Systems Development              210 South Fourth Street
(215) 592-8900                 ..!asi!metro       Philadelphia, PA 19106

pgf@mtung.UUCP (11/13/87)

	Wow!  I haven't seen the net in this much agreement on a single 
	subject in years!  It sounds like there might really be a bug 
	in the SysV filesystem!  Even rec.bicycles doesn't get along this 
	well!

	    :-)

-- 
			Paul Fox, AT&T Information Systems, Middletown NJ.
			  [ihnp4|vax135]!mtung!pgf (201)957-2698

hanko@edge.UUCP (Jim Hanko) (11/13/87)

In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
>> 
>> I have seen the same thing on an ICM3216 running SysV.2.2.
>> The inode count of the spool file system (where news reside)
>> will drop from 12000 to 0 within minutes (perhaps seconds) while
>> unpacking a compressed news batch (rnews -U). Recourse is to go
>> to single user mode and do an fsck on the file system. This will
>> restore all (12000) lost inodes.  This scenario happens about
>> once per month and I have not noticed a correlation with the
>> news volume.
>
>I've encountered this problem on a couple of Dual Systems orphaned
>machines.  I always figured it was Unisofts or Duals fault.
>

I ran into the same probem on our news file system and tracked it down
to a generic System V bug in the ialloc() module.  The problem occurs
because ialloc() scans from the last inode allocated to the end of the
inode table looking for free inodes.  If none are found (e.g. if the
last allocated inode was near the end of the table and all subsequent
ones are in use), then "out of inodes" is reported.  It DOES NOT go
back to search for free inodes from the beginning.  Therefore, this
error can occur even when many free inodes are available. 

The fix involves checking whether the search began at inode 0 when no free
inodes were found.  If it didn't, then re-start the search at 0. If it
did, THEN print "out of inodes" and exit. 

This problem rarely shows up on "normal" file systems, but the high level
of activity in net file systems seems to aggrivate it. 

---
Jim Hanko		...{mot|ism780|oliveb}!edge!hanko
Edge Computer,
Scottsdale AZ
-- 

emigh@ncsugn.ncsu.edu (Ted H. Emigh) (11/15/87)

In article <161@datapg.DataPg.MN.ORG> sewilco@datapg.DataPg.MN.ORG (Scot E. Wilcoxon) writes:
>Some people have noticed a connection between the inode problem and
>rnews running around the time when cronlog is cleared.
This is not the situation at ncsugn (3B2/400 with SVR3.0).  Here,
cronlog is never cleared automatically -- in fact, until last week,
it had not been cleared since March.  In that time, we have had two
loss of inodes.
-- 
Ted H. Emigh, Dept. Genetics and Statistics, NCSU, Raleigh, NC
uucp:	mcnc!ncsuvx!ncsugn!emigh	internet:  emigh%ncsugn.ncsu.edu
BITNET: NEMIGH@TUCC                  @ncsuvx.ncsu.edu:emigh@ncsugn.ncsu.edu

allbery@ncoast.UUCP (Brandon Allbery) (11/16/87)

As quoted from <359@minya.UUCP> by jc@minya.UUCP (John Chambers):
+---------------
| Gee, when I look in /usr/include/sys/inode.h on this 5.2 system, I see:
| 
| 	struct	inode
| 	{
| 		struct inode *i_forw;	/* hash chain forw */
| 		struct inode *i_back;	/* hash chain back */
| 		char	i_flag;
| 
| It sure looks like someone is doing linked lists from a hash table.  This
| would explain how things could get lost, and why fsck would find them.  It
| would also explain why fask makes comments about inode lists.
+---------------

Sorry.  This is the in-memory copy of an inode; the linked list in question
is a linked list of inodes currently in memory.  Why would they be in memory?
For speed.  Why do they need speed?  Because they represent:

	* root directories of filesystems
	* "chroot" root directories
	* current directories
	* mount points
	* open files
	* saved-text and/or demand-paged executables currently in use

all of which are referenced constantly.
-- 
Brandon S. Allbery		      necntc!ncoast!allbery@harvard.harvard.edu
{hoptoad,harvard!necntc,{sun,cbosgd}!mandrill!hal,uunet!hnsurg3}!ncoast!allbery
			Moderator of comp.sources.misc

richard@islenet.UUCP (Richard Foulk) (11/18/87)

In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes:
> [...]
> The fix involves checking whether the search began at inode 0 when no free
> inodes were found.  If it didn't, then re-start the search at 0. If it
> did, THEN print "out of inodes" and exit. 
> 
> This problem rarely shows up on "normal" file systems, but the high level
> of activity in net file systems seems to aggrivate it. 

Great!  Looks like we may be closing in on a solution here.  Maybe.

So the question is: do you have diffs for the fix?

Since quite a number of people with a few different versions of
System V have reported encountering this problem, it seems that
the ialloc routine you mentioned probably hasn't changed across
versions of unix.  Does that seem like a reasonable assumption?

Since only rnews seems to provoke this bug is there some sort of
way to avoid the bug that comes to mind?

I've just finished running fsck on my news file system for about
the 8th or 10th time today -- all for one days batch of news.

Any insights into a solution or work-around are appreciated.

Thanks.


-- 
Richard Foulk		...{dual,vortex,ihnp4}!islenet!richard
Honolulu, Hawaii

henry@utzoo.UUCP (Henry Spencer) (11/18/87)

> ...the difference between Version 7 and later versions ( > System III )
> is the free inode count is maintained in the superblock...
> ... Now, the kernel `knows' how
> many free inodes are out there without even looking.

Given that it doesn't know *where* they are, how is this useful?  The only
situation in which knowing the count is significant is when a filesystem
is out of inodes, or so close to it that the search for more can be cut
short by the count.  Now the $64 question:  how frequent is this?  Not very,
in my experience.  In fact, I'm not sure I've ever seen it.  I question the
value of an "optimization" for such a rare case that introduces bugs in more
common situations, which is obviously the case here.

P.S. My kernel maintains the inode count, but only for human contemplation;
	the kernel itself pays no attention to it.

P.P.S. Whatever the bug is, it must be something that AT&T added since V7,
	since my system never loses inodes.
-- 
Those who do not understand Unix are |  Henry Spencer @ U of Toronto Zoology
condemned to reinvent it, poorly.    | {allegra,ihnp4,decvax,utai}!utzoo!henry

slb@boole.acc.virginia.edu (sandy) (11/19/87)

In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes:
>In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
  ( description of problem of appearing to run out of inodes when there 
   are actually many free inodes. )
>I ran into the same probem on our news file system and tracked it down
>to a generic System V bug in the ialloc() module.  The problem occurs
>because ialloc() scans from the last inode allocated to the end of the
>inode table looking for free inodes.  If none are found (e.g. if the
>last allocated inode was near the end of the table and all subsequent
>ones are in use), then "out of inodes" is reported.  It DOES NOT go
>back to search for free inodes from the beginning.  Therefore, this
>error can occur even when many free inodes are available. 
>
>The fix involves checking whether the search began at inode 0 when no free
>inodes were found.  If it didn't, then re-start the search at 0. If it
>did, THEN print "out of inodes" and exit. 
>
>This problem rarely shows up on "normal" file systems, but the high level
>of activity in net file systems seems to aggrivate it. 

This sounds good - we have this problem frequently, so I checked out
the source that we have (SVR2 and SVR3).  It's true that whenever you
have to replentish the free inode cache, you search through the inode
list starting wherever you left off last time.  And it's also true that
you only search to the bottom of the list - you don't wrap around.  But
there is also code there to reset the starting point to zero whenever
you managed to find a few free inodes, but not enough to totally fill
the cache.  So, it seems to me that the problem could only arise when
there were exactly enough inodes left between the starting point and the
end to fill the cache (any less and you'd reset the starting point, any
more and there'd be one to find on the next search) the last time around.
But it doesn't seem as though this could arise often enough to account
for how often I see it.  What am I missing?  (if it's obvious, please
be kind ...)

And another thing - how come an fsck fixes it?  I can see how it
resets the free inode count so you no longer think you're out of
inodes, but it doesn't seem to reset the starting point for the
search (at least a brief search through fsck.c turns up no obvious
references to that field).  Wouldn't you just have the same problem
the next time you alloc'ed an inode?  Does mount reset this?

-- 
sandy bryant
slb@virginia.edu
uunet!virginia!slb

slb@boole.acc.virginia.edu (sandy) (11/19/87)

In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes:
>And another thing - how come an fsck fixes it?  I can see how it
>resets the free inode count so you no longer think you're out of
>inodes, but it doesn't seem to reset the starting point for the
>search (at least a brief search through fsck.c turns up no obvious
>references to that field).  Wouldn't you just have the same problem
>the next time you alloc'ed an inode?  Does mount reset this?

I just looked and mount does reset it.  That means that if you can't
fix the source, you might be able to ward off the evil spirits by 
just umnounting and remounting the file system (i.e. you don't have
to reboot or even fsck).
-- 
sandy bryant
slb@virginia.edu
uunet!virginia!slb

mmengel@cuuxb.ATT.COM (Marc W. Mengel) (11/20/87)

-

In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes:
$In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes:
$>In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
...
$>I ran into the same probem on our news file system and tracked it down
$>to a generic System V bug in the ialloc() module.  The problem occurs
$>because ialloc() scans from the last inode allocated to the end of the
$>inode table looking for free inodes.  If none are found (e.g. if the
$>last allocated inode was near the end of the table and all subsequent
$>then "out of inodes" gets printed
...
$This sounds good - we have this problem frequently, so I checked out
$the source that we have (SVR2 and SVR3).  It's true that whenever you
$have to replentish the free inode cache, you search through the inode
$list starting wherever you left off last time.  And it's also true that
$you only search to the bottom of the list - you don't wrap around.  But
$there is also code there to reset the starting point to zero whenever
$you managed to find a few free inodes, but not enough to totally fill
$the cache.  So, it seems to me that the problem could only arise when
$there were exactly enough inodes left between the starting point and the
$end to fill the cache (any less and you'd reset the starting point, any
$more and there'd be one to find on the next search) the last time around.
$But it doesn't seem as though this could arise often enough to account
$for how often I see it.  What am I missing?  (if it's obvious, please
$be kind ...)
$
$And another thing - how come an fsck fixes it?
...

fsck does *NOT* fix it, it re-fills your free inode cache and does not
affect that pointer.  This means that your search pointer points to
the end of the inode table, and every time your inode cache runs out,
you get the "out of space" error.  Fsck re-fills the inode cache, but
the search pointer never moves from the end of the inode table...
So you see, once you hit the condition of cache-size inodes being after
the search pointer, you get a condition where you falsely run out of
inodes
$-- 
$sandy bryant
$slb@virginia.edu
$uunet!virginia!slb

-- 
 Marc Mengel	

 attmail!mmengel
 ...!{moss|lll-crg|mtune|ihnp4}!cuuxb!mmengel

hanko@edge.UUCP (Jim Hanko) (11/24/87)

I must apologize for not responding sooner, but I was put out of action for
a week due to an auto accident. Anyway (some history):

In article <3626@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
>  ( description of problem of appearing to run out of inodes when there 
>   are actually many free inodes. )

In article <986@edge.UUCP> hanko@edge.UUCP (Jim Hanko) [that's me] writes:
> (description of fix)

In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes:
> ....  So, it seems to me that the problem could only arise when
>there were exactly enough inodes left between the starting point and the
>end to fill the cache (any less and you'd reset the starting point, any
>more and there'd be one to find on the next search) the last time around.
>But it doesn't seem as though this could arise often enough to account
>for how often I see it.  What am I missing?  (if it's obvious, please
>be kind ...)

It has been almost a year since I fixed this bug, so I had forgotten some
of the details of the problem.  It is true that this will only occur when
exactly NICINOD free inodes were left from the starting point of the 
last successful search.  This seems to occur more often than you might expect
on a file system that is very active (e.g. net news). I believe it happens
when you start using inodes near the end of the inode table.  For example,
if a file is deleted and its inode is approximately NICINOD from the end,
and if this is the last inode re-allocated from the cache, the next group
of inodes found will be near the end.  If many files are then created and 
deleted in rapid fire, chances are good that the situation will show up.

In article <3646@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
>So the question is: do you have diffs for the fix?

I am a little uncomfortable about posting diffs, so I will try to do so without
giving too much detail.

=== in module ialloc() ===
<	fp->s_ninode = NICINOD;
<	ino = ...
<	for(adr = ... {
<		.
<		.	<-- major loop which searches for free inodes
<		.
<	}
---
>
>again:	/* come back here if necessary to re-search from beginning */
>
>	fp->s_ninode = NICINOD;
>	ino = ...
>	for(adr = ... {
>		.
>		.	<-- major loop which searches for free inodes
>		.
>	}
>	/*
>	 *	If we didn't find any and we didn't start at the beginning,
>	 *	look again starting at the beginning
>	 */
>	if (fp->s_ninode == NICINOD && fp->s_inode[0] != 0) {
>		fp->s_inode[0] = 0;
>		goto again;
>	}
---

I don't generally like using 'goto's, but it seemed the least intrusive
way to fix the problem (please, no flames).

In article <3646@islenet.UUCP> richard@islenet.UUCP (Richard Foulk) writes:
>Since only rnews seems to provoke this bug is there some sort of
>way to avoid the bug that comes to mind?

The only thing I can suggest is to have so many free inodes that you rarely
go near the end of the table.  I'm not sure how much that will help, though.

---
Jim Hanko		...{mot|ism780|oliveb}!edge!hanko
Edge Computer,
Scottsdale AZ
-- 

hanko@edge.UUCP (Jim Hanko) (11/24/87)

In article <993@edge.UUCP> hanko@edge.UUCP (Jim Hanko) writes:
>In article <325@boole.acc.virginia.edu> slb@boole.acc.virginia.edu writes:
>> ....  So, it seems to me that the problem could only arise when
>>there were exactly enough inodes left between the starting point and the
>>end to fill the cache (any less and you'd reset the starting point, any
>>more and there'd be one to find on the next search) the last time around.
>>But it doesn't seem as though this could arise often enough to account
>>for how often I see it.  What am I missing? ...
>
> ( vigorous hand waving in lieu of explanation )

After further reflection, I believe I can explain more clearly.

1) The relevant features of the ialloc() algorithm are:
	- A cache of NICINOD (100) free inodes is maintained.
	- When the cache becomes empty, a linear scan through the inode
	  table is performed to find free inodes.
	- When the cache becomes full (due to file deletions), a new 
	  (relatively random) scan point is established, based on the
	  last freed inode.
	- The bug occurs when the scan takes exactly the last NICINOD
	  free inodes in the table.

2) On a 'normal' file system, file creations and deletions are not well
   correlated, and occur with approximately equal frequency.  Therefore,
   the cache usually contains sufficient free inodes, and the scan is
   rarely necessary.  When it is, typically one short scan from a random
   point usually takes care of it.  Therefore, the bug almost never
   shows up in this type of file system. 

3) A 'net news' file system experiences repeated episodes where many
   new files are created at once (new articles arrive), intermixed with
   episodes where many files are deleted at once (articles expire,
   packed news files deleted).  If enough new files are created at once,
   relative to the number of free inodes, the likelyhood becomes high
   that a scan from any random point in the table will reach the end. 
   Normally, it will then resume at the beginning.  However, on average,
   1% of the time (i.e. 1/NICINODE scans) the bug will strike instead. 
   If it doesn't, the periods of file deletion will establish a new
   (somewhat) random starting point, creating a new opportunity for the
   bug to appear during the next file creation binge. 

Therefore, If you don't have a source license and can't install the fix I
posted in an earlier article, I can only suggest that you keep a large 
number of free inodes in your 'net news' file system.  This will reduce, but
not eliminate, the probability of the bug affecting you.

---
Jim Hanko		...{mot|ism780|oliveb}!edge!hanko
Edge Computer,
Scottsdale AZ
-- 

jc@minya.UUCP (11/26/87)

While we're working on the out-of-inodes problem, perhaps it should be 
pointed out that there is also a similar out-of-blocks problem.  On this
and quite a lot of other SysV machines, I've been able to cause a file
system to run out of free blocks, in such a way that running fsck finds
a whole lot (typically in the thousands).  I don't have any good evidence
of what causes it, other than that it appears to be similar to the inode
problem:  Running a test program that rapidly creates and unlinks small
files will often produce the problem, especially if you run N copies of
the program in parallel.  I'd work on it here, but I don't have the source
on this machine.  [Well, it's a good excuse to be lazy! :-]

-- 
John Chambers <{adelie,ima,maynard,mit-eddie}!minya!{jc,root}> (617/484-6393)