[comp.unix.microport] Another fix for the SYSV inode problem

bill@twwells.uucp (T. William Wells) (01/29/89)

Two things happened today (really yesterday, but this fix took all
night): first, my system crashed twice due to the inode bug, and
second, I received Jim Valerio's stuff on fixing the inode problem for
System V/386 Release 3. I'm running Microport's 3.0e; and here is
what is going on.

S5ialloc (aka ialloc) has a bug. It seems that the code is dependent
on the condition that the inode cache always contains the lowest free
inode. This is a condition that just can't be met.

Jim Valerio's fix is to always scan the inode list when the cache
runs out.  I didn't like that; my system is already disk-bound and I
don't want to add more load on the disk, so I disassembled the code
and found a fix.  My fix is to ignore a failure to read inodes and
try again.  This has the advantage of not requiring a rescan except
when the inode pointer gets screwed up.

The following is relevant code from the disassembly:

define(`NICINOD',       100)

define(`s_ninode',      212(%edi))      # short  number of i-nodes in s_inode
define(`s_inode',       214(%edi))      # ushort free i-node list
define(`s_tinode',      436(%edi))      # ushort total free inodes

.readinodes:    .0xFC
	movw    s_tinode,%ax    / check to see that there are some free inodes
	testw   %ax,%ax
	je      .noinodes       / no, branch to the error handler
	movw    $NICINOD,s_ninode / this is the number of inodes we can read
	movzwl  s_inode,%eax    / the first inode to read from the disk
...

.0x209:
	movw    s_ninode,%ax    / did we get enough inodes for the cache?
	testw   %ax,%ax
	jle     .0x236          / yes, proceed
	leal    s_inode,%eax    / this is the address of the inode table
	movswl  s_ninode,%edx   / this is how many inodes we couldn't get
	decl    %edx            / stick a zero before the inodes to force
	movw    $0,(%eax,%edx,2) / a reread when they are all used up
	movw    $0,s_inode      / zero the first inode in the cache
.0x236:
	movswl  s_ninode,%eax   / if no inodes were read into the cache
	cmpl    $NICINOD,%eax
	je      .noinodes       / fail due to lack no inodes
	movw    $NICINOD,s_ninode / otherwise set the cache pointer to its end
	jmp     .0x2C5          / and then go back to allocating

---

Note these two key facts:

    1) .readinodes checks s_tinode before reading from the disk.

    2) If nothing gets put into the cache, the first entry of s_inode
       is zeroed.

Therefore, if the last `je .noinodes' is changed to `je .readinodes',
the bug goes away!  Does anyone see any problem with this patch?

---

On my system, I patched s5.o (/etc/atconf/modules/s5/s5.o) with the
following:

adb -w s5.o <<+
s5ialloc+244?W0fffffeb4
+

REMEMBER. MAKE A BACKUP COPY OF S5.O AND VERIFY THAT THE CODE LOOKS
SOMETHING LIKE MINE!!!!

Then rebuild your kernel. All kernels made after this change will
have the fix.

---
Bill
{ uunet!proxftl | novavax } !twwells!bill

eric@egsner.UUCP (Eric Schnoebelen) (01/30/89)

In article <347@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes:
-Two things happened today (really yesterday, but this fix took all
-night): first, my system crashed twice due to the inode bug, and
-second, I received Jim Valerio's stuff on fixing the inode problem for
-System V/386 Release 3. I'm running Microport's 3.0e; and here is
-what is going on.

[ long dissertation on chasing and fixing System V inode bug..  ]

-Then rebuild your kernel. All kernels made after this change will
-have the fix.
-
----
-Bill
-{ uunet!proxftl | novavax } !twwells!bill

My questions are:
	
	1.  Will this fix work on 2.2... ( I realize the addresses are
probably different, any hints what the new(old?) addresses are? )

	2. ( and more importantly.. )  Has a similar fix been found for
Microport System V/AT?  I am currently running fsck nightly from a
crontab to keep news from killing my system, and would like to stop.

	Thanks in advance
		Eric

-- 
Eric Schnoebelen
egsner!eric@texbell.uucp			...!texbell!egsner!eric
egs@u-word.dallas.tx.us				...!killer!u-word!egs
"All this science, I can't understand; It's just my job 5 days a week"

bill@twwells.uucp (T. William Wells) (02/02/89)

In article <153@egsner.UUCP> eric@egsner.UUCP (Eric Schnoebelen) writes:
: In article <347@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes:
: -System V/386 Release 3. I'm running Microport's 3.0e; and here is
: -what is going on.
: -...
: -Then rebuild your kernel. All kernels made after this change will
: -have the fix.
:
: My questions are:
:
:       1.  Will this fix work on 2.2... ( I realize the addresses are
: probably different, any hints what the new(old?) addresses are? )

It might, but someone will have to go look.

:       2. ( and more importantly.. )  Has a similar fix been found for
: Microport System V/AT?  I am currently running fsck nightly from a
: crontab to keep news from killing my system, and would like to stop.

I dunno. Someone with a disassembler should have no problem creating
such a fix, if it can be done at all.  If you want to do it, go get
_The Design of the UNIX Operating System_, (Maurice J. Bach,
Prentice-Hall, ISBN 0-13-201799-7), it gives rather detailed (though
not C code!) descriptions of what goes on in the UNIX kernel. It is
based on Release 2, though it is still a useful guide for Release 3.
 For example, there is pseudocode for ialloc (aka s5ialloc) on page
78, something really useful when trying to understand the assembly.

BTW, running fsck doesn't prevent the bug. Twice now I've had the
inode bug hit me within minutes of fsck'ing my system.

---
Bill
{ uunet!proxftl | novavax } !twwells!bill