[comp.unix.sysv386] The INfamous inode bug

pete@fidata.fi (Petri Helenius) (12/25/90)

  Is the famous inode-lost bug finally fixed in ISC 2.2.1 release? Or is it
still "a known bug"?

--
--------------------------------------------------------------
 Petri Helenius, Fimeko-Data Oy      Phone     +358-0-458 2421
----------------------------------!  Telefax   +358-0-458 2425
 Looking for Unix(r) FAX-system ? !  Internet  pete@fidata.fi
 Mail queries to unifax@fidata.fi !  FidoNet   2:504/23.0
--------------------------------------------------------------

pizzi@esacs.UUCP (Riccardo Pizzi) (01/03/91)

In article <PETE.90Dec25124943@fidata.fi> pete@fidata.fi writes:
>
>  Is the famous inode-lost bug finally fixed in ISC 2.2.1 release? Or is it
>still "a known bug"?

Can someone please tell me about this bug? I am running 2.2.0 and would like
to know if there is something I should be aware of. I never experienced
bugs with file system till now.

Rick
--
Riccardo Pizzi @ ESA Software, Rimini, ITALY
e-mail: pizzi%esacs@relay.EU.net -or- root@xtc.sublink.org
<< Object Oriented is an Opaque Disease >>

pete@fidata.fi (Petri Helenius) (01/04/91)

  The "in"famous inode-bug causes the OS to "lose" (ie. your inode count
decreases without inodes being allocated) inodes at rather heavy inode
deallocation/allocation circumstances. Specially this will pop up when
processing USENET news. Sooner or later you'll be biten with this problem.
The cure is to umount, fsck and mount the filesystem again.
--
--------------------------------------------------------------
 Petri Helenius, Fimeko-Data Oy      Phone     +358-0-458 2421
----------------------------------!  Telefax   +358-0-458 2425
 Looking for Unix(r) FAX-system ? !  Internet  pete@fidata.fi
 Mail queries to unifax@fidata.fi !  FidoNet   2:504/23.0
--------------------------------------------------------------

john@newave.UUCP (John A. Weeks III) (01/05/91)

In article <43@esacs.UUCP> pizzi@esacs.UUCP (Riccardo Pizzi) writes:
>In article <PETE.90Dec25124943@fidata.fi> pete@fidata.fi writes:
> > Is the famous inode-lost bug finally fixed in ISC 2.2.1 release? Or is it
> > still "a known bug"?

> Can someone please tell me about this bug? I am running 2.2.0 and would like
> to know if there is something I should be aware of. I never experienced
> bugs with file system till now.

Some versions of UNIX, even AT&T code, had a bug that caused the number
of inodes to suddenly become zero even though there were many unused.
Running fsck would restore the correct free inode count.  Heavy USENET
activity often makes the problem show up in the partition that holds the
news spool directory.  By the time one figures out that there is a problem,
it is too late--bytes fall on the floor because UNIX thinks the file system
is full.  This problem exists in ISC version 2.0.2.

-john-

-- 
===============================================================================
John A. Weeks III               (612) 942-6969               john@newave.mn.org
NeWave Communications                 ...uunet!rosevax!tcnet!wd0gol!newave!john
===============================================================================

richard@pegasus.com (Richard Foulk) (01/06/91)

>  The "in"famous inode-bug causes the OS to "lose" (ie. your inode count
>decreases without inodes being allocated) inodes at rather heavy inode
>deallocation/allocation circumstances. Specially this will pop up when
>processing USENET news. Sooner or later you'll be biten with this problem.
>The cure is to umount, fsck and mount the filesystem again.

As I recall it requires a certain sequence of events to make this bug
awaken -- Cnews doesn't seem to tickle it at all.



-- 
Richard Foulk		richard@pegasus.com

pizzi@esacs.UUCP (Riccardo Pizzi) (01/07/91)

>  The "in"famous inode-bug causes the OS to "lose" (ie. your inode count
>decreases without inodes being allocated) inodes at rather heavy inode
>deallocation/allocation circumstances. Specially this will pop up when
>processing USENET news. Sooner or later you'll be biten with this problem.
>The cure is to umount, fsck and mount the filesystem again.

Hmmm... I'm running ISC 2.2 from 3 month now (I've ran 2.0.2 for about
one year without problems) and I'm spooling a *lot* of news every night
(about 1,5-2 Mb).
I didn't notice anything strange with my filesystem.
Are you sure it was not a problem with your hw?

Rick
--
Riccardo Pizzi @ ESA Software, Rimini, ITALY
e-mail: pizzi%esacs@relay.EU.NET -or- root@xtc.SUBLINK.ORG
<< Object Oriented is an Opaque Disease >>

jgd@Dixie.Com (John G. DeArmond) (01/08/91)

pizzi@esacs.UUCP (Riccardo Pizzi) writes:

>Hmmm... I'm running ISC 2.2 from 3 month now (I've ran 2.0.2 for about
>one year without problems) and I'm spooling a *lot* of news every night
>(about 1,5-2 Mb).
>I didn't notice anything strange with my filesystem.
>Are you sure it was not a problem with your hw?

THAT's not a lot of news, THIS is a lot of news (Whips out his traffic
report) :-)  We averaged 10 mb/day throughout the month of December
up to the Christmas break.  We usually run from 4 to 6 mb/day.  The Inode
bug gets us about twice a week.  I have a cron job that runs every 10 minutes
that looks at the inode count and when it hits the emergency level (10,000
on our system) it fusers everybody off the news partition and runs fsck.
I get a printout of the activity on our logging printer so I can see
when it happens.  When the volume was up around 10 mb/day, it got us almost
every day.  There is a binary patch available for 2.2, though I've
not yet applied it.

John

-- 
John De Armond, WD4OQC        | "Purveyors of speed to the Trade"  (tm)
Rapid Deployment System, Inc. |  Home of the Nidgets (tm)
Marietta, Ga                  | "To be engaged in opposing wrong offers but 
{emory,uunet}!rsiatl!jgd      |  a slender guarantee of being right."

dorman@chiton.ucsd.edu (Leroy M. Dorman) (01/08/91)

In article <5685@rsiatl.Dixie.Com>, jgd@Dixie.Com (John G. DeArmond) writes:
> pizzi@esacs.UUCP (Riccardo Pizzi) writes:
> 
> >Hmmm... I'm running ISC 2.2 from 3 month now (I've ran 2.0.2 for about
> >one year without problems) and I'm spooling a *lot* of news every night
> >(about 1,5-2 Mb).
> >I didn't notice anything strange with my filesystem.
> >Are you sure it was not a problem with your hw?
> 
> ...stuff deleted here....
> -- 
> John De Armond, WD4OQC        | "Purveyors of speed to the Trade"  (tm)

There are other things which can look like the "inode bug".
I was pretty sure I was having this problem with SysV3.0
on a Multibus I machine and anxiously awaited SYSV3.2.
The symptoms were that the filesystem would become
corrupted at times of heavy disk activity (like compiling
Unix from the source code) leading to loss of many files
and, occasionally, irrepairable damage to the filesystem.
The problem was ultimately cured by removing the Maxtor
SCSI harddisk from the peripheral bay of the Intel 320
and putting it in a separate box.  The problems went away.
-- 
LeRoy M. Dorman		Scripps Institution of Oceanography, A-015	
University of California, San Diego	La Jolla, CA 92093-0215	
(619) 534-2406	 omnet:   mpl.sio  .OR. sio.obs fax:	(619) 534-6849
internet:	ldorman@ucsd.edu .OR. ucsd.edu!siolmd!dorman

scjones@thor.UUCP (Larry Jones) (01/08/91)

In article <43@esacs.UUCP>, pizzi@esacs.UUCP (Riccardo Pizzi) writes:
> [ asks about the infamous system V lost inode bug ]

Since the very early days of System V, the file system code has
contained a bug that will cause the file system to completely
lose track of some free inodes if exactly the right pattern of
allocating and freeing occurs.  The chances of hitting exactly
the right pattern are very small, so it doesn't occur very
often.  However, people running B News seem to hit it quite
often for some basically inexplicable reason.

Interactive 2.2 contains some code which was intended to fix
this bug, but it is not quite correct and the bug still occurs.
I have previously posted a patch which will fix ISC 2.2 (I
don't know if it is needed for 2.2.1 or not -- perhaps someone
who has it could check and let me know).  I also have some text
files which describe the problem in great detail.  I will be
happy to send either or both to anyone who wants them or, if
there is sufficient demand, post them.

(I also have fixes for ISC 2.0.2 and Microport System V/AT if
anyone wants them!)
----
Larry Jones                         UUCP: uunet!sdrc!thor!scjones
SDRC                                      scjones@thor.UUCP
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
I'm not a vegetarian!  I'm a dessertarian. -- Calvin

fortin@zap.uucp (Denis Fortin) (01/08/91)

In article <652@chiton.ucsd.edu> dorman@chiton.ucsd.edu (Leroy M. Dorman) writes:
>In article <5685@rsiatl.Dixie.Com>, jgd@Dixie.Com (John G. DeArmond) writes:
>> pizzi@esacs.UUCP (Riccardo Pizzi) writes:
>> >Hmmm... I'm running ISC 2.2 from 3 month now [...] I didn't notice anything
>> >strange with my filesystem. Are you sure it was not a problem with your hw?
>> 
>> [inode bug bites us a couple of times a week]
>> John De Armond, WD4OQC        | "Purveyors of speed to the Trade"  (tm)
>
>There are other things which can look like the "inode bug". [...]

Nah...  It's *the bug*!  I used to be bitten by the "infamous inode
bug" on System V/AT about once a week, so after I upgraded to ISC
2.0.2 and I realized that that old "friend" was still in there, I
applied the binary patch and have been living happily ever since...

-- 
Denis Fortin, DMR Group Inc, (514) 877-3301 (All of these opinions are my own)
fortin@zap.uucp   uunet!sobeco!zap!fortin   fortin%zap@larry.mcrcim.mcgill.edu

pizzi@esacs.UUCP (Riccardo Pizzi) (01/08/91)

In article <5685@rsiatl.Dixie.Com> jgd@Dixie.Com (John G. DeArmond) writes:

>THAT's not a lot of news, THIS is a lot of news (Whips out his traffic
>report) :-)  We averaged 10 mb/day throughout the month of December
>up to the Christmas break.  We usually run from 4 to 6 mb/day.  The Inode

Well, I was referring to my home system. 2 Mb/day is enough for a home
system with no TCP connections (only the Traily), isn't?
I am using Cnews and heard from some guy on the net that this package
doesn't cause the bug to show up. I think that, even if I don't spool
10 Mbyte/day, even spooling as little :-) as 2 Mb/day should cause the
inode count to decrease. Consider that I do not run fsck except when I
had a power failure (in fact, the startup procedure does it for me).
I never shut down the system, either; so little-by-little the count should
reach 0. But it does not happens. Any clues?

>I get a printout of the activity on our logging printer so I can see
>when it happens.  When the volume was up around 10 mb/day, it got us almost
>every day.  There is a binary patch available for 2.2, though I've
>not yet applied it.

What news package are you running? Maybe this makes a difference.
I would like to hear from ISC support about the bug status (I run 2.2).

Rick
--
Riccardo Pizzi @ ESA Software, Rimini, ITALY
e-mail: pizzi%esacs@relay.EU.NET -or- root@xtc.SUBLINK.ORG
<< Object Oriented is an Opaque Disease >>

bill@unixland.uucp (Bill Heiser) (01/09/91)

In article <5685@rsiatl.Dixie.Com> jgd@Dixie.Com (John G. DeArmond) writes:
>
>THAT's not a lot of news, THIS is a lot of news (Whips out his traffic
>report) :-)  We averaged 10 mb/day throughout the month of December
>up to the Christmas break.  We usually run from 4 to 6 mb/day.  The Inode

I seem to average from 8 to 12mb per day, and haven't seen this problem
(YET) on ESIX.  Does anyone know if this problem exists in ESIX, or if
it specific to ISC?  I'm using the Esix FFS.


-- 
home:	...!{uunet,bloom-beacon,esegue}!world!unixland!bill
	bill@unixland.uucp    Public Access Unix - Esix SYSVR3
508-655-3848(12/24)  508-651-8723(12/24/96-HST)  508-651-8733(12/24/96-PEP-V32)
other:	heiser@world.std.com

bill@unixland.uucp (Bill Heiser) (01/09/91)

In article <1991Jan08.140455.27471@nstar.rn.com> larry@nstar.rn.com (Larry Snyder) writes:
>
>we bring in a backbone newsfeed of 1180 newsgroups - including
>regional newgroups (ne, ca, sa, ucla, mi, fl, ont, can, eunet, etc)
>and traffic here was running around 10 megs per day before the
>holidays (yesterdays traffic was 14 megs).  

This brings up an interesting news-feed phenomenon -- I only get around
750 newsgroups -- and my traffic is usually (don't have the figures in
front of me) somewhere around 8mb.  10-12mb is not at all uncommon.  
Right after the holiday break, one day was 15.5mb.

It's interesting to see so much of a variation in what people are reporting
for traffic.


-- 
home:	...!{uunet,bloom-beacon,esegue}!world!unixland!bill
	bill@unixland.uucp    Public Access Unix - Esix SYSVR3
508-655-3848(12/24)  508-651-8723(12/24/96-HST)  508-651-8733(12/24/96-PEP-V32)
other:	heiser@world.std.com

larry@nstar.rn.com (Larry Snyder) (01/09/91)

bill@unixland.uucp (Bill Heiser) writes:

>750 newsgroups -- and my traffic is usually (don't have the figures in
>front of me) somewhere around 8mb.  10-12mb is not at all uncommon.  
>Right after the holiday break, one day was 15.5mb.

>It's interesting to see so much of a variation in what people are reporting
>for traffic.

I guess the traffic depends on the quantity of newsgroups - as well
as how far one is down the line (which would cause a larger paths line).

When we were getting news from iuvax->news.nd.edu we were only picking
up 10 or so different base groups - but since we've switched to the university
of michigan - we now get around 30..  I guess some backbone sites don't
get all the newsgroups that are available.  We even get a regional newsgroup
for stuggart, west germany.

Traffic yesterday was 13.5 megs -

-- 
   Larry Snyder, NSTAR Public Access Unix 219-289-0282 (HST/PEP/V.32/v.42bis)
                        regional UUCP mapping coordinator 
  {larry@nstar.rn.com, ..!uunet!nstar!larry, larry%nstar@iuvax.cs.indiana.edu}

chris@alderan.uucp (Christoph Splittgerber) (01/10/91)

In article <102@thor.UUCP> scjones@thor.UUCP (Larry Jones) writes:


|>I have previously posted a patch which will fix ISC 2.2 (I
|>don't know if it is needed for 2.2.1 or not -- perhaps someone
|>who has it could check and let me know).  I also have some text
|>files which describe the problem in great detail.  I will be
|>happy to send either or both to anyone who wants them or, if
|>there is sufficient demand, post them.

After all I've been reading in this newsgroup about this bug, I think
you should post it.
 
Chris
-- 
----------------------------------------------------------------------------
Replies-To:  chris@alderan.uucp        UUCP: uunet!mcsun!unido!alderan!chris 
Phone:       +49 711 344375            Fax:  +49 711 3460684

det@hawkmoon.MN.ORG (Derek E. Terveer) (01/10/91)

pizzi@esacs.UUCP (Riccardo Pizzi) writes:

>In article <5685@rsiatl.Dixie.Com> jgd@Dixie.Com (John G. DeArmond) writes:

>Well, I was referring to my home system. 2 Mb/day is enough for a home
>system with no TCP connections (only the Traily), isn't?

No.  I have a relatively small home system, but i get up to 14MB/day (on heavy
days).  I think that the difference is that I have a trailblazer and I can
physically accept that high a rate (at 2400 baud (~avg 220cps) the most you can
get in one day (24 hours) is about 18MB).
-- 
Derek "Tigger" Terveer	det@hawkmoon.MN.ORG - MNFHA, NCS - UMN Women's Lax, MWD
I am the way and the truth and the light, I know all the answers; don't need
your advice.  -- "I am the way and the truth and the light" -- The Legendary Pink Dots

support@bomber.ism.isc.com (Support Account) (01/11/91)

In article <53@esacs.UUCP> pizzi@esacs.UUCP (Riccardo Pizzi) writes:
>What news package are you running? Maybe this makes a difference.
>I would like to hear from ISC support about the bug status (I run 2.2).
>

Code was added/fixed in V2.2 of Interactive Unix which resolved
all the instances of the inode problem Interactive could test
internally. Recently, another situation which generates the bug
was discovered, which was not tested for, and which is being
fixed. There is no date when this fix will be available, but
it is possible it will be in Interactive Unix V2.3.


...

steve@nuchat.sccsi.com (Steve Nuchia) (01/12/91)

In article <1991Jan10.161022.3360@ism.isc.com> support@bomber.ism.isc.com (Support Account) writes:
>Code was added/fixed in V2.2 of Interactive Unix which resolved
>all the instances of the inode problem Interactive could test
>internally. Recently, another situation which generates the bug
>was discovered, which was not tested for, and which is being fixed.

Wouldn't it have been easier to have proved the revised code the
first time?  Simple little resource managers like that can be proved
in a half hour or so, and then you don't embarass yourself.

(No, I don't always prove my code the first time.  But if there is
a second time you can bet there won't be a third time.)

Testing for correctness.  Sheesh.
-- 
Steve Nuchia	      South Coast Computing Services      (713) 964-2462
	"Could we find tools that would teach their own use,
	 we should have discovered something truly beyond price."
		Socrates, in Plato's Republic

scjones@thor.UUCP (Larry Jones) (01/13/91)

In article <1991Jan12.025119.27665@nuchat.sccsi.com>, steve@nuchat.sccsi.com (Steve Nuchia) writes:
> In article <1991Jan10.161022.3360@ism.isc.com> support@bomber.ism.isc.com (Support Account) writes:
> >Code was added/fixed in V2.2 of Interactive Unix which resolved
> >all the instances of the inode problem Interactive could test
> >internally. Recently, another situation which generates the bug
> >was discovered, which was not tested for, and which is being fixed.
> 
> Wouldn't it have been easier to have proved the revised code the
> first time?  Simple little resource managers like that can be proved
> in a half hour or so, and then you don't embarass yourself.

Your have a valid point, but at least Interactive TRIED to fix the
problem.  AT&T has known about this bug for heaven-only-knows how
long and they've just finally gotten around to fixing it for R4.
And who knows if their fix is really right or not?!?
----
Larry Jones, SDRC, 2000 Eastman Dr., Milford, OH  45150-2789  513-576-2070
Domain: scjones@thor.UUCP  Path: uunet!sdrc!thor!scjones
My life needs a rewind/erase button. -- Calvin

woods@eci386.uucp (Greg A. Woods) (01/16/91)

In article <114@thor.UUCP> scjones@thor.UUCP (Larry Jones) writes:
> Your have a valid point, but at least Interactive TRIED to fix the
> problem.  AT&T has known about this bug for heaven-only-knows how
> long and they've just finally gotten around to fixing it for R4.
> And who knows if their fix is really right or not?!?

Then again, those of us who do the old resource utilization
calculations and predictions never exercise the bug in the first
place.  Even with a 100 Mb news spool I've never run out of inodes,
and thus I don't know if any system I use has ever had the bug!

Careful tuning of the news software will also usually prevent you from
even running out of blocks.  Further checks can prevent you from ever
running out of inodes too.
-- 
							Greg A. Woods
woods@{eci386,gate,robohack,ontmoh,tmsoft}.UUCP		ECI and UniForum Canada
+1-416-443-1734 [h]  +1-416-595-5425 [w]  VE3TCP	Toronto, Ontario CANADA
Political speech and writing are largely the defense of the indefensible-ORWELL

jon@hitachi.uucp (Jon Ryshpan) (01/17/91)

[ISC Support Account] writes:

>Code was added/fixed in V2.2 of Interactive Unix which resolved
>all the instances of the inode problem Interactive could test
>internally.  Recently, another situation which generates the bug
>was discovered, which was not tested for, and which is being
>fixed.  There is no date when this fix will be available, but
>it is possible it will be in Interactive Unix V2.3.

Maybe you could post a fix for the reckless to try out.  Then when you
release the product, you can be fairly sure that it works.  I think
this is called beta testing.

Jonathan Ryshpan		<...!uunet!hitachi!jon>

scjones@thor.UUCP (Larry Jones) (01/18/91)

In article <1991Jan15.201410.18885@eci386.uucp>, woods@eci386.uucp (Greg A. Woods) writes:
> Then again, those of us who do the old resource utilization
> calculations and predictions never exercise the bug in the first
> place.  Even with a 100 Mb news spool I've never run out of inodes,
> and thus I don't know if any system I use has ever had the bug!

You've misunderstood -- no amount of calculation and checking can
protect you from the inode bug.  The problem is that, under certain
conditions, the system CLAIMS that there are no free inodes when,
in fact, there can be arbitrarily many.  When the bug hits, you can
instantly go from having hundreds (or even thousands) of free inodes
to zero.  The necessary conditions are a certain pattern of
allocations and frees -- it is sufficiently obscure that I'm a bit
surprised that anyone ever hits it, but they do.  And some hit it
quite often.

I've never had the problem myself, but I believe in preventive
measures.  When Bill Wells posted the patch for Microport
System V/386, I looked at the code and the patch and decided
that it was a reasonable fix and applied it.  When I switched
to ISC 2.0.2 and discovered it had the same problem, I again
applied Bill's patch.  When I got the update to 2.2 and found
out that ISC's fix didn't quite fix the problem, I again
looked at the code, developed a modified patch, and applied
it.

So, for those who have expressed reservations about apply binary
patches obtained from the network (a healthy dose of paranoia is
good for any system manager) -- I can personally vouch for the
Microport 386 and ISC 2.0.2 and 2.2 patches.  Of course, you
don't know me, but you do have my name, address, phone number,
and net address! :-)

Of course, it doesn't matter to me whether you apply the patches
of not.  That's a decision that you will have to make for youself
based on your perception of the severity of the problem, the
liklihood of it occurring, the consequences of its occurring,
the liklihood of there being some problem with the patch, and
the potential consequences of a problem with a patch.  For what
it's worth, my evaluation of my situation is that the severity
of the problem is fairly low -- the chances of it happening is
very small and the consequences are not great since my system
is basically single user and my news comes via NFS from another
machine, I don't have a direct feed.  Running fsck would take
a while, but would fix the file system.  On the other hand, the
patch is fairly small and the affected routine (s5ialloc) is
also fairly small.  I was able to understand the affected code
and the patch fairly easily and convince myself that it was a
reasonable solution to the problem, so I decided to install it.
----
Larry Jones, SDRC, 2000 Eastman Dr., Milford, OH  45150-2789  513-576-2070
Domain: scjones@thor.UUCP  Path: uunet!sdrc!thor!scjones
I just can't identify with that kind of work ethic. -- Calvin

randyt@asdnet (Randy Terbush) (01/20/91)

In <128@thor.UUCP> scjones@thor.UUCP (Larry Jones) writes:

>So, for those who have expressed reservations about apply binary
>patches obtained from the network (a healthy dose of paranoia is
>good for any system manager)

I can vouch for the safety of this patch.  I first applied it after
Larry's original posting several months ago.  I have had no
problems since applying it.

Of course, it is always possible that Larry has setup the patch to
remove all filesystems on the 1 year anniversary of his posting. :-)

-- 
Randy Terbush ------------------ Mammoth Lakes, CA - Voice +1 619 934 0340
asdnet!randyt ------------------------------------------------------------
Advanced System Design --------------- UNIX Workstation Design and Support