bill@twwells.uucp (T. William Wells) (01/29/89)
Two things happened today (really yesterday, but this fix took all night): first, my system crashed twice due to the inode bug, and second, I received Jim Valerio's stuff on fixing the inode problem for System V/386 Release 3. I'm running Microport's 3.0e; and here is what is going on. S5ialloc (aka ialloc) has a bug. It seems that the code is dependent on the condition that the inode cache always contains the lowest free inode. This is a condition that just can't be met. Jim Valerio's fix is to always scan the inode list when the cache runs out. I didn't like that; my system is already disk-bound and I don't want to add more load on the disk, so I disassembled the code and found a fix. My fix is to ignore a failure to read inodes and try again. This has the advantage of not requiring a rescan except when the inode pointer gets screwed up. The following is relevant code from the disassembly: define(`NICINOD', 100) define(`s_ninode', 212(%edi)) # short number of i-nodes in s_inode define(`s_inode', 214(%edi)) # ushort free i-node list define(`s_tinode', 436(%edi)) # ushort total free inodes .readinodes: .0xFC movw s_tinode,%ax / check to see that there are some free inodes testw %ax,%ax je .noinodes / no, branch to the error handler movw $NICINOD,s_ninode / this is the number of inodes we can read movzwl s_inode,%eax / the first inode to read from the disk ... .0x209: movw s_ninode,%ax / did we get enough inodes for the cache? testw %ax,%ax jle .0x236 / yes, proceed leal s_inode,%eax / this is the address of the inode table movswl s_ninode,%edx / this is how many inodes we couldn't get decl %edx / stick a zero before the inodes to force movw $0,(%eax,%edx,2) / a reread when they are all used up movw $0,s_inode / zero the first inode in the cache .0x236: movswl s_ninode,%eax / if no inodes were read into the cache cmpl $NICINOD,%eax je .noinodes / fail due to lack no inodes movw $NICINOD,s_ninode / otherwise set the cache pointer to its end jmp .0x2C5 / and then go back to allocating --- Note these two key facts: 1) .readinodes checks s_tinode before reading from the disk. 2) If nothing gets put into the cache, the first entry of s_inode is zeroed. Therefore, if the last `je .noinodes' is changed to `je .readinodes', the bug goes away! Does anyone see any problem with this patch? --- On my system, I patched s5.o (/etc/atconf/modules/s5/s5.o) with the following: adb -w s5.o <<+ s5ialloc+244?W0fffffeb4 + REMEMBER. MAKE A BACKUP COPY OF S5.O AND VERIFY THAT THE CODE LOOKS SOMETHING LIKE MINE!!!! Then rebuild your kernel. All kernels made after this change will have the fix. --- Bill { uunet!proxftl | novavax } !twwells!bill
eric@egsner.UUCP (Eric Schnoebelen) (01/30/89)
In article <347@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes:
-Two things happened today (really yesterday, but this fix took all
-night): first, my system crashed twice due to the inode bug, and
-second, I received Jim Valerio's stuff on fixing the inode problem for
-System V/386 Release 3. I'm running Microport's 3.0e; and here is
-what is going on.
[ long dissertation on chasing and fixing System V inode bug.. ]
-Then rebuild your kernel. All kernels made after this change will
-have the fix.
-
----
-Bill
-{ uunet!proxftl | novavax } !twwells!bill
My questions are:
1. Will this fix work on 2.2... ( I realize the addresses are
probably different, any hints what the new(old?) addresses are? )
2. ( and more importantly.. ) Has a similar fix been found for
Microport System V/AT? I am currently running fsck nightly from a
crontab to keep news from killing my system, and would like to stop.
Thanks in advance
Eric
--
Eric Schnoebelen
egsner!eric@texbell.uucp ...!texbell!egsner!eric
egs@u-word.dallas.tx.us ...!killer!u-word!egs
"All this science, I can't understand; It's just my job 5 days a week"
bill@twwells.uucp (T. William Wells) (02/02/89)
In article <153@egsner.UUCP> eric@egsner.UUCP (Eric Schnoebelen) writes: : In article <347@twwells.uucp> bill@twwells.UUCP (T. William Wells) writes: : -System V/386 Release 3. I'm running Microport's 3.0e; and here is : -what is going on. : -... : -Then rebuild your kernel. All kernels made after this change will : -have the fix. : : My questions are: : : 1. Will this fix work on 2.2... ( I realize the addresses are : probably different, any hints what the new(old?) addresses are? ) It might, but someone will have to go look. : 2. ( and more importantly.. ) Has a similar fix been found for : Microport System V/AT? I am currently running fsck nightly from a : crontab to keep news from killing my system, and would like to stop. I dunno. Someone with a disassembler should have no problem creating such a fix, if it can be done at all. If you want to do it, go get _The Design of the UNIX Operating System_, (Maurice J. Bach, Prentice-Hall, ISBN 0-13-201799-7), it gives rather detailed (though not C code!) descriptions of what goes on in the UNIX kernel. It is based on Release 2, though it is still a useful guide for Release 3. For example, there is pseudocode for ialloc (aka s5ialloc) on page 78, something really useful when trying to understand the assembly. BTW, running fsck doesn't prevent the bug. Twice now I've had the inode bug hit me within minutes of fsck'ing my system. --- Bill { uunet!proxftl | novavax } !twwells!bill