[comp.unix.wizards] Alzheimer's Syndrome

mayer@cooper.cooper.EDU (Mayer Ilovitz ) (09/30/88)

	Because of all the postings which have appeared over the last few weeks
in various newsgroups concerning mysteriously disappearing inodes
which reappear after performing an fsck on the afflicted file system,
I am reposting this article which will (hopefully) answer many of the
questions. ( this was originally posted the week of Thanksgiving Day 1987.)

Since the original posting, I have heard a rumor that the inode allocation/
free bug which is responsible for these problems will be fixed in System V 4.0.

						Mayer Ilovitz
						mayer@cooper.cooper.edu
						cmcl2!cooper!mayer	

-----------------------snip, snip, snip-------------------------------------

	This document contains what I believe is a complete analysis of
the System V inode allocation system and the problem that everyone is having.
I have included a test procedure which should detect the problem on a UNIX
system and included a program that will help you perform the tests. Also I 
have some suggestions on properly fixing the bug.

	To begin with, let me describe what I have available to me.
We have a number of pretty standard Unix-PCs running System V 3.5 and
System V 3.0 . We have four 3b2-400's running System V 3.0 Version 2. These
machines each have a floppy diskette system. We also have an OLD 3b5 running
System V Release 2 Version 2. I have access to the source for the Unix on the
3b5 and the 3b2 systems. Our 3b5 runs our newsfeed using the Bnews 2.11 package.
This system has suffered the inode problems that everyone has been mentioning
on the net for the last few weeks. Since this system has no expendable file
systems, I ran the tests on the 3b2 and the Unix-PC . Both of these systems
showed the same error. From this, I suspect that all versions of ATT System V
unix have the problem. Furthermore, this problem may very well be in any 
ATT System-V compatible version of Unix and may well have been present in
System-III Unix. I therefore suggest, just to be on the safe side, that you
run the test described below.

	The analysis and test was based on the source from the 3b5. A cursory
examination of the source to the 3b2 showed the code to be essentially the
same in the critical area though there are what appear to be minor cosmetic
changes. For those of you with access to the souces, The file that needs
changing is called alloc.c or s5alloc.c . If you don't have a file by this
name, look for a file that closely matches one of these names. The function
that is causing the problem is called s5ialloc() or ialloc() .

	Ialloc() and ifree() are the low level inode allocation control system.
When an inode is needed, a call to ialloc() is made. When a file/directory is
deleted, ifree() is used  to release the inode.  These 2 functions use certain
parameters that are kept in the superblock of every file system. tinode is the
total number of free inodes in a file system. To speed up inode allocation and
freeing, the superblock maintains a table of free inodes. This table is called
inode[]. The size of this table is given by the #defined value NICINOD and is
usually 100. ninode specifies the number of free inodes available in inode[].

	When ifree() releases an inode, it first checks to see
if the inode table is full. If it isn't, the inode is added to the top of the
table and ninode is adjusted. If the table is full and the inode being released
is less than the inode stored in inode[0], the newly released inode is put into
inode[0]. In this way, the allocation system knows where in the i-list a group
of free inodes are likely to be.

	When ialloc() is called, it tries to give the requesting process an
inode from inode[]. If none are available, ialloc() searches the i-list for
more free inodes to reload inode[]. ialloc() will start this search begining
at the location of the last allocated inode as indicated by inode[] and 
ninode.  The search continues untill NICINOD inodes are located or the end of
the i-list is reached. inode[] will be reloaded from the top of the table
working down to inode[0]. A mark is put in inode[] if less than 100 nodes were
found. The next time inode[] runs out of nodes, this mark tells it to search
the i-list from the very begining. If  NO inodes were found during the search,
ninode is SET TO 0 and the out of inodes error is printed on the system console.

	The problem that everyone is having is caused by the following
situation. At the last reloading of inode[] exactly NICINOD inodes were found.
Therefore, the inode at inode[0] is where the next search for inodes will begin.
As the system runs, more inodes are allocated and freed. Eventually, the last
free inode in inode[] is allocated. The system waits until the next call to 
ialloc to determine if it needs to reload inode[]. If a node is released before
the inode table is reloaded, the freed inode will go into inode[0], replacing
the old value which would be used for searching the i-list. If the freed inode
was higher in the i-list than the one it replaced in the table, ialloc will no
longer know that it should check the lower portion of the i-list for free
inodes. It will think that everything below inode[0] is allocated already.
If a significant number of lower valued inodes are not freed before ialloc
has to reload the inode table, ialloc will fail to find any free inodes even
though they exist.  Furthermore, because of the coding of ialloc(), unless an
inode is freed at some point, every time it tries looking for more inodes, it
will start at the same place. So until the file system is dismounted and fsck'd,
unless some inodes are freed, the system will be stuck repeating the same search
and reporting the same failure.

	The original intent of the ialloc() - ifree() system is to minimize
the time to find more free nodes by remembering the best location to start
searching for more free inodes. Therefore, the best fix to ialloc would be
to first try to give the requesting process a free node. ialloc() should
then IMMEDIATELY check to see if that was the last free inode it had, and if
it was, try reloading the inode table right then. This will prevent the
possibility of the system from forgetting about the best place to search for
inodes. A side result of this is that the out of inodes message will appear
when the last free inode is allocated and not when ialloc failed to give
an inode. An argument could be made either way as to wether this side effect
is good or not. The other fix is to put a kludge into ialloc that, in the
event that NO free inodes were found, it would immediately recycle through
the i-list from the very beginning looking for inodes before deciding that there
are no free inodes left. If the i-list is large, this can be somewhat
inefficient.


	PROCEDURE TO TEST FOR THE 3B INODE ALLOCATION BUG


	This test is intended to be run on a floppy-based file system or an
expendable file system. It is assumed that NICINOD, the number of inodes that
are stored in the superblock inode table is 100. If not, the test will have
to be adjusted accordingly.

	1. create a file system with ~ 280 inodes using mkfs
		fsck the disk and mount it /mnt 

	2. verify with a df -t as to the number of free inodes and the total
		number of inodes in this file system.

	3. allocate all the inodes on this filesystem. You can use the program
		fillnode given at the bottom of this document to help you do
		the job. The final result is that there should be 0 inodes left.
		Each file that you made on this disk should be named after its
		respective inode.

	4. unmount the filesystem, do an fsck of the disk, remount and
		verify with a df -t that there are no free inodes.

	5. free up the files with inodes 3-202. This will give you 200 free
		inodes on the filesystem. Verify this using step 4.

	6. at this point, the file system will be mounted and the superblock
		inode table will contain inodes 3-102 for immediate allocation.

	7. use fillnode to reallocate inodes 3-102. at this point you will have
		100 free inodes when you do a df. This is the correct number of 
		free nodes. At this point the superblock inode table will be 
		empty.

	8. use fillnode to allocate 1 inode. the inode that will be allocated is
		inode # 103. At this point the superblock inode table will have
		been reloaded from the i-list. the 0 element in the table will
		be inode 202 and the 99th element will be inode 103, which you
		just allocated.

	9. Delete in order the files with inodes 30-39. At this point, the 0
		element in the inode table will be inode 31 while the 99th 
		element will be inode 30. When you released inode 30, the 
		table was not full, so it was put onto the top of the table.
		When inode 31 was released, the table was full so ifree checked
		to see if the just freed inode was less than the inode in the
		0th element of the table. Since the 0th element up to this time
		was 202, ifree replaced the 0th element with inode 31. Note,
		The inode table is now full, containing 100 free inodes, the
		lowest free inode in the entire i-list being in the 0th element
		of the table. As you release inodes 32-39, they will fail the
		test by ifree, the result being that these inodes ARE free but
		simply aren't in the inode table. This is alright since when
		ialloc must reload its inode table it will start looking with
		the inode referenced in the 0th element of the table.

	10. allocate another 100 inodes. fillnode will allocate in order 
		inodes 30,104-201 and inode 31. At this point the superblock
		inode table is empty again. However, as always, ialloc will
		leave the table empty until it must allocate an inode and finds
		no inodes in the table.

	11. free inode 240. At this point you have sealed your doom ! .
		ifree will put this inode into the lowest available entry in the		inode table, DESTROYING ANY MEMORY THAT THE LOWEST FREE INODE IS
		AROUND INODE 31. 

	12. Do a df -t to confirm that you still have ~ 10 free inodes.

	13. allocate an inode. This inode will be inode 240.

	14. Do a df -t to confirm that you now have ~ 9 free inodes.

	15. Call fillnode again and say goodbye to your free inodes!
		At this point you will get an out of inodes error on your
		console and the allocation attempt will return failure. A df -t
		say that there are NO free inodes. What happened was that after
		step 13 there were no free nodes in the superblock inode table.
		At this point, ialloc went searching through the i-list for
		more free inodes starting at the inode specified in the 0th
		element of the inode table. BUT this no longer references inode
		31, where we know there is more space, but inode 240. ialloc
		searches from inode 240 to the end of the i-list, but all those
		inodes are allocated, so ialloc decides that there are no more
		free inodes and reports the out of inodes error,EVEN though
		you still have free inodes!.

	16. unmount the filesystem. Do an fsck. This will report a bad inode
		count in the Superblock ( Sound familiar ) which you must
		fix. Remount and do an df -t to confirm that you really do
		still have a number of free inodes.

	IF THE SITUATIONS DESCRIBED IN THIS TEST HAPPEN TO YOU

		AND YOU ARE HAVING PROBLEMS BECAUSE OF THIS BUG

	CONTACT YOUR ATT CUSTOMER/TECH SUPPORT REP AND REPORT THE PROBLEM

below is the code for fillnode.c . This program will create a file in /mnt.
The file created will be named after the inode to which it was allocated.
The file will have 0 blocks allocated to it.

#include <fcntl.h>
#include <sys/types.h>
#include <sys/stat.h>
main()
{
	int link(),open(),close(),fstat();
	struct stat buf;
	int fd;
	char name[30];

	if( (fd = open("/mnt/XXX",O_CREAT | O_WRONLY,0666) ) < 0 )
	{
		printf("can't open file\n");
		exit(2);
	}
	if( fstat(fd,&buf) < 0 )
	{
		printf("error fstating file\n");
		exit(3);
	}
printf("inode is %d\n",buf.st_ino);
	sprintf(name,"/mnt/%d",buf.st_ino);
	close(fd);
	if( link("/mnt/XXX",name) < 0 )
	{
		printf("can't link to new name\n");
		exit(3);
	}
	if( unlink("/mnt/XXX") < 0 )
	{
		printf("can't unlink old file /mnt/XXX\n");
		exit(3);
	}
	exit(0);
}

peter@stca77.stc.oz (Peter Jeremy) (10/11/88)

In article <1384@cooper.cooper.EDU> mayer@cooper.cooper.EDU (Mayer Ilovitz ) writes:
>	Because of all the postings which have appeared over the last few weeks
>in various newsgroups concerning mysteriously disappearing inodes
>which reappear after performing an fsck on the afflicted file system,
>I am reposting this article which will (hopefully) answer many of the
>questions. ( this was originally posted the week of Thanksgiving Day 1987.)
>
>Since the original posting, I have heard a rumor that the inode allocation/
>free bug which is responsible for these problems will be fixed in System V 4.0.

Just out of interest (I hadn't seen the problem, but I don't run any
filesystems close to inode capacity so I mightn't), I tried the procedure
on XENIX/286 2.2.1, and to my surprise, it passed the test.  At step 15,
it allocated inode 32 (I kept going and it happily allocated 33..39 and 202).

It looks like this is one bug that Microsoft or SCO corrected.  One
interesting oddity was that when using the inodes initially (on a
filesystem that has just been mkfs'd), it allocated 3..202, 211..272,
203..210.  There was also a noticable delay between allocating 202 and
211, although this may just have been the system load (I was using a
floppy disk for the experiment).  This doesn't quite jell with my
understanding of the way inode allocation is supposed to work (on a
clean file system the free inode list should be sorted, and therefore
inodes should be allocated in inode order).
-- 
Peter Jeremy (VK2PJ)         peter@stca77.stc.oz
Alcatel-STC Australia        ...!munnari!stca77.stc.oz!peter
41 Mandible St               peter%stca77.stc.oz@uunet.UU.NET
ALEXANDRIA  NSW  2015

henry@utzoo.uucp (Henry Spencer) (10/14/88)

In article <308@stca77.stc.oz> peter@stca77.stc.oz (Peter Jeremy) writes:
>...on a
>clean file system the free inode list should be sorted, and therefore
>inodes should be allocated in inode order).

Well, do remember that there's no such thing as a "free inode list" with
all the free inodes of the filesystem on it.  What there is, is an in-core
list of up to circa 100 free inodes.  That list is only a hint to improve
performance; it's the inode itself that records whether it is allocated
or not.  If the in-core list fills up, Unix just throws away the extra
hints.  If the in-core list becomes empty, Unix searches the filesystem
for more free inodes to refill it.  "Searches" means just what you think
it does:  sequential search through the inodes looking for empties.  In
modern Unixes an attempt is made to start the search from a different
point each time, to avoid long delays.  There is plenty of room for bugs
and odd behavior in this scheme.

(NB:  SysV and BSD have each added their own wrinkles which change some of
the details of the above.)
-- 
The meek can have the Earth;    |    Henry Spencer at U of Toronto Zoology
the rest of us have other plans.|uunet!attcan!utzoo!henry henry@zoo.toronto.edu