mouse@thunder.mcrcim.mcgill.edu (der Mouse) (05/24/91)
Gotta call for help on this one. We have a MicroVAX here (try to contain your winces :-) which has two DEQNAs and a disk controller using the UDA-50 driver (I would call it a UDA-50, but I don't think that's the proper term for the Q-bus board; it's an RQ??-50 or third-party emulator or some such; I can get details if it matters). This machine had a very old Ultrix (1.something) running on it, and everything worked fine, so there is nothing drastically wrong with the hardware setup. I am trying to put mtXinu 4.3+NFS on this machine. (This is not the machine I really want to get this working on; this is a machine we have on loan from another department for use as a scratch machine debugging the problem, because the real target machine cannot be taken down randomly to debug the problem.) Everything works fine, provided I somehow disable recognition of at least one of the DEQNAs, by almost any method: by taking the board out of the system, by removing it from the configuration, by adding code to if_qe.c to ignore it...and it doesn't matter whether I disable qe0 or qe1; in each case, the other one works fine. But if I build a kernel that tries to use both qe0 and qe1, the system hangs. When I crashed it a post-mortem stack trace seemed to implicate the uda driver. So the next thing I did was to install Chris Torek's uda driver; with it in there, I get "uda0: lost interrupt" followed by a bus reset. The system remains hung, and after a short time (probably somewhere between 10 and 30 seconds, from memory) this repeats. It has kept on repeating for as long as I've had patience to let it. So I started in debugging it. First action was to add code to if_qe to disable initialization of the second qe at various points. After a bit of this (the edit-compile-test cycle is not the zippiest), I convinced myself that the call to if_ubaminit() in qeinit() was at fault. I then moved the test in there and localized it to the loop that calls if_ubaalloc for the receive mbuf clusters. I must admit this has me baffled. No problem is caused by running through this code once, for the "working" qe. It's only when it's called twice, for qe0 and qe1 both, that there's any problem. And even then, it's the uda driver, not the qe driver, that loses interrupts! And I'm convinced it's a software problem because it happens on both this machine and the real target machine, and this one ran Ultrix just fine with the same hardware. And, there's another uVAX in another lab with two DEQNAs, ra disks, and Ultrix, running fine. I'll be nosing around looking for any further hints, but this is getting weird enough that I don't really have any confidence left that I can find it in a reasonable amount of time, so I thought I'd ask the net and see if anyone could help.... Anyone? der Mouse old: mcgill-vision!mouse new: mouse@larry.mcrcim.mcgill.edu