[comp.bugs.4bsd] Bring up Reno

mwm@hslrswi.hasler.ascom.ch (Mike McGann) (04/10/91)

I have brought 4.3 Reno up on a uVax with an
RD53 and a cdc xmd(ra81 emulation) There are some
small problems:

uda.c - now compares not only drive code but also 
        geometry against its tables and they are wrong for a rd53.

passwd.c - thinks the username is argv[0] instead of 1, so it works if you rename it to 
           the user name.

But I am having a bigger problem. It seems to trash a filesystem in some circumstances
to the extent that fsck can't fix it. I only have to run as root, and try to make
gnu emacs 18.55. Suddenly one of the directories disappears just looks like  somebody
unlinked it. Ok I run fsck on the filesystem and it acts just like that. It finds
some disconnected directories and so forth. But I never get the space back, I can't
delete the files from lost+found, and its downhill from there until I remake the fs.

Anybody seen this problem, a fix?


mike
mwm@hslrswi.hasler.ascom.ch

jhma@tharr.UUCP (James Aldridge) (04/11/91)

In article <1926@hslrswi.hasler.ascom.ch> mwm@hslrswi.hasler.ascom.ch (Mike McGann) writes:
>I have brought 4.3 Reno up on a uVax with an
>RD53 and a cdc xmd(ra81 emulation) There are some
>small problems:

I have brought 4.3 Reno up on a uVAX and attemped it on a VAX11/750

>uda.c - now compares not only drive code but also 
>        geometry against its tables and they are wrong for a rd53.

Our uVAX II is entirely DEC kit so I can't comment on this one.

>passwd.c - thinks the username is argv[0] instead of 1, so it works if you rename it to 
>           the user name.

The problem only exists if you don't run the Kerberos stuff.  It can be made
to work as normal (passwd username) if you add the following lines as a #else
clause to the #ifdef KERBEROS in main():

	#else
		argc--; argv++;

which corresponds to the corrresponding code in the kerberos case (decoding
the -l flag using getopt).

>But I am having a bigger problem. It seems to trash a filesystem in some circumstances
>to the extent that fsck can't fix it.

I believe there was a bug in very early releases of Reno's fsck program - I
seem to recall a posting some time back but can't remember when or exactly
what he problem was.

The Major problem I have had installing 4.3BSD-Reno on our 11/750 is that it
doesn't recognise our tape drive when using the supplied kernels.  We have an
Emulex TC12 (ts11 compatible) tape controller which has worked quite happily
under all previous 4.x BSD releases.  4.3 Reno, recognises that there is some
sort of controller present but complains that "zs0: didn't interrupt".  I
have managed to get around the problem by using a modified version of the
driver from 4.3BSD and now have to wait for a suitable weekend to do the rest
of the installation!

>mike
>mwm@hslrswi.hasler.ascom.ch

James

-- 
James Aldridge / Solid State Logic Ltd. / Begbroke / Oxford / UK
      Telephone: +44 865 842300 x229 / Fax: +44 865 842118
----------------------------------------------------------------
<- tharr *free* public access to Usenet in the UK 0234 720202 ->

dennis@gpu.utcs.utoronto.ca (Dennis Ferguson) (04/11/91)

In article <1926@hslrswi.hasler.ascom.ch> mwm@hslrswi.hasler.ascom.ch (Mike McGann) writes:
>But I am having a bigger problem. It seems to trash a filesystem in some circumstances
>to the extent that fsck can't fix it. I only have to run as root, and try to make
>gnu emacs 18.55. Suddenly one of the directories disappears just looks like  somebody
>unlinked it. Ok I run fsck on the filesystem and it acts just like that.

We had vaguely similar symptoms.  Our machines would crash frequently
when the file systems got busy and would come back up with some really
ugly file system inconsistancies.  This started to happen when we
turned on accounting.

We found a bug in the accounting code which was causing the routine
which checks to see if the file system is full enough that accounting
should be turned off to be called on every clock tick, rather than
once every 15 seconds as intended.  A patch to kern/kern_acct.c follows.

I don't think this change actually repaired the more serious problem
(which I suspect is some sort of race condition in the file system code)
but it did make the symptoms go away, which made us happy.  Note too
that the Reno version we were having trouble with is a homebrew port
to the IBM RT, which may have its own unique set of problems.

Dennis Ferguson

*** /tmp/,RCSt1000583	Thu Apr 11 21:18:28 1991
--- kern_acct.c	Thu Mar  7 18:38:11 1991
***************
*** 113,119 ****
  		acctp = NULL;
  		log(LOG_NOTICE, "Accounting suspended\n");
  	}
! 	timeout(acctwatch, (caddr_t)resettime, hzto(resettime));
  }
  
  /*
--- 113,120 ----
  		acctp = NULL;
  		log(LOG_NOTICE, "Accounting suspended\n");
  	}
! 	timeout(acctwatch, (caddr_t)resettime,
! 	    (int)(resettime->tv_sec * hz + resettime->tv_usec / tick));
  }
  
  /*

torek@elf.ee.lbl.gov (Chris Torek) (04/13/91)

In article <2032@tharr.UUCP> jhma@tharr.UUCP (James Aldridge) writes:
>The Major problem I have had installing 4.3BSD-Reno on our 11/750 is that it
>doesn't recognise our tape drive when using the supplied kernels.  We have an
>Emulex TC12 (ts11 compatible) tape controller which has worked quite happily
>under all previous 4.x BSD releases.  4.3 Reno, recognises that there is some
>sort of controller present but complains that "zs0: didn't interrupt".

I wrote this code.  It worked fine on my VAXen at Maryland (which had
Dilog `DEC' controllers and Emulex TC13s) and apparently works on a
real DEC TS11 controller at Berkeley, but you are not the first one to
report this.  I cannot explain it---the code obeys all the rules in the
Emulex manual.  If I had some time with a system with this problem I
could no doubt fix it (and maybe this explains why the original driver
author thought it was `too hard' to make it interrupt).

There is a `quick fix': in tsprobe(), just before

	if (cvec == 0 || cvec == 0x200)	/* no interrupt */
		ubarelse(numuba, &a);

add

	if (cvec == 0x200 && ctlr == 0) {
		/*
		 * No interrupt, assume standard vector		XXX
		 * (need to find out why this happens)
		 */
		cvec = 0224;
		br = 0x15;
	}

This will only support ts0; if you prefer you can make it:

	if (cvec == 0x200) {
		/*
		 * No interrupt, assume standard vector		XXX
		 * (need to find out why this happens)
		 */
		cvec = (unsigned)reg & 7 ? 0260 : 0224;
		br = 0x15;
	}

(which then makes the following test for 0x200 fail, so it could be
removed).  This will make the code act as before except when the
probe succeeds (then the code will use the real interrupt vector).

The 4.3reno code is older than the stuff I have now, which just uses
TS_IE|TS_SETCHR (no TS_SENSE) and allows up to 2 minutes for the
interrupt (in case the controller is busy with a rewind), but this did
not fix the problem on the other system on which it was reported.  If
you are in the SF Bay area and have a machine with this problem, and
will let me experiment, send me mail....
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab CSE/EE (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

muller@sdcc10.ucsd.edu (Keith Muller) (04/13/91)

In article <12027@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes:
> If I had some time with a system with this problem I
> could no doubt fix it (and maybe this explains why the original driver
> author thought it was `too hard' to make it interrupt).

There is a firmware bug in several versions of the proms in emulex tc12 and
tc13 controllers that cause it to not interrupt in the probe routine.
I know that there are proms for the tc13 that make it interrupt both under
tahoe and reno. You probably want to give Emulex customer service a call.

Keith Muller
University of California, San Diego