[comp.unix.wizards] Autoconfig

nessus@athena.mit.edu (Doug Alan) (06/30/88)

I'm led to believe from reading "Building Berkeley UNIX* Kernels with
Config", that if your system works, you should be able to power down
your system, pull out a controller from the bus (replacing it with a
grant card), and reboot the system, and your system will still boot,
as long as the controller that you removed wasn't critical for
booting.  Unfortunately, this usually does not seem work for us.

Depending on which controller I pull out, some different sorts of
things happen.  For some boards it works.  An example of this is a
DHV11.  If I remove this, the system still boots fine.  For some
controllers, the system hangs when it gets to the point in the
autoconfig sequence where the missing controller would normally be
found.  A more preculiar mode of failure happens when I remove a
second disk controller.  The autoconfig sequence finds the first
controller twice!  And both times it finds it at the same CSR address.
It assigns each disk drive to two different device names.  The
autoconfig sequence then merrily continues on, and seems to be working
fine, until the system finally gets to the point where it tries to
give you a /bin/sh.  At this point it hangs.

Does someone have any idea what is going on, and how I can get things
to work, so that I can remove controllers without building a new
kernal?

We use VAXstation II's, running 4.3BSD+NFS (from U of Wisc).  The disk
controllers are Sigma RQD11-EC's (ESDI MSCP Qbus controllers).

I also have another, perhaps related, problem, which maybe someone has
an idea about.  We have a uVax-II with two of the aforemention disk
controllers and the aforementioned kernal.  It also has a Wespecorp
tape controller.  I want to put in a DHV11, but whenever I do, it
doesn't work right.  With the DHV11 in, autoconfig seems to find it
fine, but if I try to run 'stty' on one of the DHV11's terminal lines
(let's say "stty all > /dev/ttyS0"), it hangs.  If I do this from the
Bourne Shell, I can ^C out of it, but I get some sort of error (I
don't remember the exact message...  perhaps something like "no such
device").  If I do this from the C Shell, ^C and ^Z don't do anything.
Another problem that seems to occur with the DHV11 in, is that some C
programs, occasionally, when trying to dump core, cause the whole
system to become wedged.

I'm pretty sure I have the right device numbers on /dev/ttyS0, because
we have other systems with a DHV11 and the same kernal, and the DHV11
works on them.  The other systems, don't, however, have a tape
controller and two disk controllers.  Another piece to the puzzle is
that the tape controller in the past seemed to be causing us some
problems.  The problem was that whenever a filesystem on a disk
controller that was farther out on the bus than the tape controller,
was dumped to tape, any process, including the process accessing that
disk drive would hang.  The fix for this was to move the tape
controller to be further out on the bus than all the disk controllers.

I thought for a while that perhaps the problem was that we weren't
using the official DEC CSR addresses and interupt vectors for the disk
controllers and DHV11. I didn't think with Unix this should make any
difference as long as everything was spaced out enough.  (The official
DEC CSR addresses and interupt vectors are a real pain, because if you
add another disk controller, you have to go and perform hairy
calculations and then use those to guide yourself in flipping dip
switches on the DHV11).  In any case, I went through all the work of
making all the CSR addresses and interrupt vectors be up to DEC
standard, and this changed nothing.

Anyone have any ideas?

|>oug /\lan
   (or nessus@athena.mit.edu
       nessus@mit-eddie.uucp)

aida@porthos.csl.sri.com (Hitoshi Aida) (06/30/88)

In article <9600@eddie.MIT.EDU> nessus@athena.mit.edu (Doug Alan) writes:
> ...
>        A more preculiar mode of failure happens when I remove a
>second disk controller.  The autoconfig sequence finds the first
>controller twice!  And both times it finds it at the same CSR address.
>It assigns each disk drive to two different device names.  The
>autoconfig sequence then merrily continues on, and seems to be working
>fine, until the system finally gets to the point where it tries to
>give you a /bin/sh.  At this point it hangs.

The problem is "the standard addresses" embedded in the driver.
If autoconf fails to find second controller at the address specified
in the configuration file, then try to find it at the standard
addresses and finally finds the first controller located at one of
such addresses!

I think the best solution is to never use standard addresses.
You can either patch source or object of the driver so that
the first element of udastd[] etc. will have a value of 0 (short).

--------
Hitoshi AIDA (aida%inosai.u-tokyo.junet%utokyo-relay@relay.cs.net)
Dept. of Electrical Engineering, The University of Tokyo
Current Address: aida@csl.sri.com
Computer Science Lab, SRI International

chris@mimsy.UUCP (Chris Torek) (07/01/88)

In article <9600@eddie.MIT.EDU> nessus@athena.mit.edu (Doug Alan) writes:
>I'm led to believe from reading "Building Berkeley UNIX* Kernels with
>Config", that if your system works, you should be able to power down
>your system, pull out a controller from the bus (replacing it with a
>grant card), and reboot the system, and your system will still boot,
>as long as the controller that you removed wasn't critical for booting.

More or less.

>... For some boards it works. ... For some controllers, the system hangs
>when it gets to the point in the autoconfig sequence where the missing
>controller would normally be found.

This is probably a driver bug: one that does something like

	((struct foodevice *)reg)->csr = FOO_INIT;
	while ((((struct foodevice *)reg)->csr & FOO_DONE) == 0)
		/* spin */;

or something equally wrong.

>A more peculiar mode of failure happens when I remove a second dis
>controller.  The autoconfig sequence finds the first controller twice!
>And both times it finds it at the same CSR address.

A small bug, fixed in 4.3BSD-tahoe.  (The test for ualloc[addr] is missing.)

>... We [also] have a uVax-II with two [Sigma] controllers and the
>aforementioned kernal.  It also has a Wespecorp tape controller.
>I want to put in a DHV11, but whenever I do, it doesn't work right.
>With the DHV11 in, autoconfig seems to find it fine, but if I try
>to [use it] it hangs. ... Another problem that seems to occur with
>the DHV11 in, is that some C programs, occasionally, when trying to
>dump core, cause the whole system to become wedged.

Sounds like either a hardware problem or a bug in one of the drivers
that hogs Unibus (Qbus) map resources.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (07/01/88)

In article <5699@csl.CSL.SRI.COM> aida@porthos.csl.sri.com (Hitoshi Aida)
writes:
>The problem [that causes finding a controller twice] is "the standard
>addresses" embedded in the driver. ...
>I think the best solution is to never use standard addresses.

This works, but is not the best solution (else why have standard
addresses at all?).  Instead, fix the bug in autoconf.c.  Here is
one with just that fix, created only moments ago.  This is untested,
but it did come from a version that is tested.  All of these changes
(and many more) appear in the 4.3-tahoe vax/autoconf.c.

*** autoconf.c.old	Thu Jun 30 16:57:09 1988
--- autoconf.c	Thu Jun 30 17:09:06 1988
***************
*** 4,8 ****
   * specifies the terms and conditions for redistribution.
   *
!  *	@(#)autoconf.c	7.1 (Berkeley) 6/6/86
   */
  
--- 4,8 ----
   * specifies the terms and conditions for redistribution.
   *
!  *	@(#)autoconf.c	7.1+ (Berkeley) 6/6/86
   */
  
***************
*** 687,701 ****
  		}
  		printf("vec %o, ipl %x\n", cvec, br);
  		um->um_alive = 1;
  		um->um_ubanum = numuba;
! 		um->um_hd = &uba_hd[numuba];
  		um->um_addr = (caddr_t)reg;
  		udp->ud_minfo[um->um_ctlr] = um;
! 		for (ivec = um->um_intr; *ivec; ivec++) {
! 			um->um_hd->uh_vec[cvec/4] =
! 			    scbentry(*ivec, SCB_ISTACK);
! 			cvec += 4;
! 		}
  		for (ui = ubdinit; ui->ui_driver; ui++) {
  			if (ui->ui_driver != udp || ui->ui_alive ||
  			    ui->ui_ctlr != um->um_ctlr && ui->ui_ctlr != '?' ||
--- 687,701 ----
  		}
  		printf("vec %o, ipl %x\n", cvec, br);
+ 		csralloc(ualloc, addr, i);
  		um->um_alive = 1;
  		um->um_ubanum = numuba;
! 		um->um_hd = uhp;
  		um->um_addr = (caddr_t)reg;
  		udp->ud_minfo[um->um_ctlr] = um;
! 		for (cvec /= 4, ivec = um->um_intr; *ivec; cvec++, ivec++)
! 			uhp->uh_vec[cvec] = scbentry(*ivec, SCB_ISTACK);
  		for (ui = ubdinit; ui->ui_driver; ui++) {
+ 			int t;
+ 
  			if (ui->ui_driver != udp || ui->ui_alive ||
  			    ui->ui_ctlr != um->um_ctlr && ui->ui_ctlr != '?' ||
***************
*** 702,710 ****
  			    ui->ui_ubanum != numuba && ui->ui_ubanum != '?')
  				continue;
! 			if ((*udp->ud_slave)(ui, reg)) {
  				ui->ui_alive = 1;
- 				ui->ui_ctlr = um->um_ctlr;
  				ui->ui_ubanum = numuba;
! 				ui->ui_hd = &uba_hd[numuba];
  				ui->ui_addr = (caddr_t)reg;
  				ui->ui_physaddr = pumem + ubdevreg(addr);
--- 702,713 ----
  			    ui->ui_ubanum != numuba && ui->ui_ubanum != '?')
  				continue;
! 			t = ui->ui_ctlr;
! 			ui->ui_ctlr = um->um_ctlr;
! 			if ((*udp->ud_slave)(ui, reg) == 0)
! 				ui->ui_ctlr = t;
! 			else {
  				ui->ui_alive = 1;
  				ui->ui_ubanum = numuba;
! 				ui->ui_hd = uhp;
  				ui->ui_addr = (caddr_t)reg;
  				ui->ui_physaddr = pumem + ubdevreg(addr);
***************
*** 716,723 ****
  				/* ui_type comes from driver */
  				udp->ud_dinfo[ui->ui_unit] = ui;
! 				printf("%s%d at %s%d slave %d\n",
  				    udp->ud_dname, ui->ui_unit,
  				    udp->ud_mname, um->um_ctlr, ui->ui_slave);
  				(*udp->ud_attach)(ui);
  			}
  		}
--- 719,727 ----
  				/* ui_type comes from driver */
  				udp->ud_dinfo[ui->ui_unit] = ui;
! 				printf("%s%d at %s%d slave %d",
  				    udp->ud_dname, ui->ui_unit,
  				    udp->ud_mname, um->um_ctlr, ui->ui_slave);
  				(*udp->ud_attach)(ui);
+ 				printf("\n");
  			}
  		}
***************
*** 768,779 ****
  		}
  		printf("vec %o, ipl %x\n", cvec, br);
! 		while (--i >= 0)
! 			ualloc[ubaoff(addr+i)] = 1;
! 		ui->ui_hd = &uba_hd[numuba];
! 		for (ivec = ui->ui_intr; *ivec; ivec++) {
! 			ui->ui_hd->uh_vec[cvec/4] =
! 			    scbentry(*ivec, SCB_ISTACK);
! 			cvec += 4;
! 		}
  		ui->ui_alive = 1;
  		ui->ui_ubanum = numuba;
--- 772,779 ----
  		}
  		printf("vec %o, ipl %x\n", cvec, br);
! 		csralloc(ualloc, addr, i);
! 		ui->ui_hd = uhp;
! 		for (cvec /= 4, ivec = ui->ui_intr; *ivec; cvec++, ivec++)
! 			uhp->uh_vec[cvec] = scbentry(*ivec, SCB_ISTACK);
  		ui->ui_alive = 1;
  		ui->ui_ubanum = numuba;
***************
*** 815,818 ****
--- 815,843 ----
  
  	wmemfree(ualloc, 8*1024);
+ }
+ 
+ /*
+  * Mark addresses starting at "addr" and continuing
+  * "size" bytes as allocated in the map "ualloc".
+  * Warn if the new allocation overlaps a previous allocation.
+  */
+ static
+ csralloc(ualloc, addr, size)
+ 	caddr_t ualloc;
+ 	u_short addr;
+ 	register int size;
+ {
+ 	register caddr_t p;
+ 	int warned = 0;
+ 
+ 	p = &ualloc[ubaoff(addr+size)];
+ 	while (--size >= 0) {
+ 		if (*--p && !warned) {
+ 			printf(
+ 	"WARNING: device registers overlap those for a previous device!\n");
+ 			warned = 1;
+ 		}
+ 		*p = 1;
+ 	}
  }
  
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris