[hacknews] kernel-testing fiasco

henry@utzoo.UUCP (Henry Spencer) (10/17/84)

Well, we tested out a new kernel, with a couple of significant performance
improvements (hashing in the inode table, and a freelist for the file
table), plus other minor trivia.  Looked fine single-user.  When we came
up multi-user, bizarre things started happening.  Shut down again, came
back up with the old /unix.  Same bizarre problems.  I noticed we were
getting disk errors.  Shut down again, started out to figure out just what
was going on -- were our precious Eagles failing all of a sudden? -- and
suddenly realized that I'd write-protected drive B during the standalone
testing, and never un-protected it!!  A flip of a switch, and we came back
up perfectly.

The real puzzle is, why was the system reacting so poorly to this?  The
proper response to something like this is a spew of console messages;
there weren't any until I tinkered with the srm parameters.  Even worse,
there were indications that user programs weren't seeing errors either.
I've seen some signs of error-handling problems in the rm driver before;
it's time for a thorough investigation.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry