bruce@THINK.COM (Bruce Nemnich) (09/13/85)
We took shipment of a new 785 about two weeks ago, and I have been having big problems getting it up. At the moment the configuration has 8Mb and two UBAs with only a UDA-50 (with 4 RA81s attached) on the first and in Interlan ethernet card on the second. I am trying to run 4.2bsd. The system runs diagnostics without error. There are two distinct symptoms. The less frequent one is a system hang; after 30 seconds or so the port lights on the RA81s will go out. Nothing is echoed on the terminal (even interrupt chars, etc). This either happens during the boot sequence either right before or after the first single-user shell prompt. The PC is within Swtch() as I recall. The 2nd symptom is that it takes a Segmentation Fault on a virtual address which is always 3ffffffc or 5ffffffc; trouble is, that address should never be referenced by the instruction it trapped on, which is usually a push on the kernel stack. When this happens while Unix is running, it is usually in the syscall() routine before dispatching to the apropriate system call routine. When it happens during the boot process, it happens a few instructions into process 0 (after the ldpctx and rei for the first process and before the call of main()). The latter case results in recursive traps (it traps in the trap handler) until the kernel stack is exhausted, and then continues to recursively trap until the interrupt stack is exhausted. The result is an ?INT-STK INVALID message on the console just after the "xxxxx+yyyyy+zzzzz start at 0xnnnn" message printed by BOOT. This error is persistent. It will sometimes not happen for a couple of days, but when it does crash (first case above), I often can't reboot for hours thereafter (second case above). Power-cycling the machine (including memory and unibuses) doesn't help. The one thing which DOES often work to get it out of this mode (discovered accidentally) is to physically remove the connection to the first unibus by reseating the UBA or the paddle card in the back of the machine. Both paddle cards and UBA have been swapped without helping the problem. Even when it is in "failure mode" it passes diagnostics. I have observed these under three versions of 4.2bsd: the current version I am running on a 750, a two-year-old 4.2bsd distribution tape, and a recent Ultrix distribution tape. I have been running my current version most of the time. There is one more disk-related problem. When the machine had been working for two days, I decided to run some filesystem tuning benchmarks (nothing sophistcated, just the ones in the "disk subsystem choices" paper from Berkeley). I found I was getting a maximum throughput to one drive of under 200kb/s, which is terrible. I ran the same tests on a similarly configured 750 and got 400kb/s, which is what I expected. Putting the 750s UDA-50 in the 785 gave it full performance (a little better than the 750; most things were deivce-speed limited). I had DEC give me a new UDA-50 for the 785, but it gets the low throughput! I didn't try the 785s UDA on the 750 yet. I am running a uda driver from daves@riacs. I plan to do further investigation on this one. If anyone has seen problems like these, please let me know. Both DEC Field Service and I are baffled. --bruce