bruce@think.ARPA (Bruce Nemnich) (09/13/85)
We took shipment of a new 785 about two weeks ago, and I have been having big problems getting it up. At the moment the configuration has 8Mb and two UBAs with only a UDA-50 (with 4 RA81s attached) on the first and in Interlan ethernet card on the second. I am trying to run 4.2bsd. The system runs diagnostics without error. There are two distinct symptoms. The less frequent one is a system hang; after 30 seconds or so the port lights on the RA81s will go out. Nothing is echoed on the terminal (even interrupt chars, etc). This either happens during the boot sequence either right before or after the first single-user shell prompt. The PC is within Swtch() as I recall. The 2nd symptom is that it takes a Segmentation Fault on a virtual address which is always 3ffffffc or 5ffffffc; trouble is, that address should never be referenced by the instruction it trapped on, which is usually a push on the kernel stack. When this happens while Unix is running, it is usually in the syscall() routine before dispatching to the apropriate system call routine. When it happens during the boot process, it happens a few instructions into process 0 (after the ldpctx and rei for the first process and before the call of main()). The latter case results in recursive traps (it traps in the trap handler) until the kernel stack is exhausted, and then continues to recursively trap until the interrupt stack is exhausted. The result is an ?INT-STK INVALID message on the console just after the "xxxxx+yyyyy+zzzzz start at 0xnnnn" message printed by BOOT. This error is persistent. It will sometimes not happen for a couple of days, but when it does crash (first case above), I often can't reboot for hours thereafter (second case above). Power-cycling the machine (including memory and unibuses) doesn't help. The one thing which DOES often work to get it out of this mode (discovered accidentally) is to physically remove the connection to the first unibus by reseating the UBA or the paddle card in the back of the machine. Both paddle cards and UBA have been swapped without helping the problem. Even when it is in "failure mode" it passes diagnostics. I have observed these under three versions of 4.2bsd: the current version I am running on a 750, a two-year-old 4.2bsd distribution tape, and a recent Ultrix distribution tape. I have been running my current version most of the time. There is one more disk-related problem. When the machine had been working for two days, I decided to run some filesystem tuning benchmarks (nothing sophistcated, just the ones in the "disk subsystem choices" paper from Berkeley). I found I was getting a maximum throughput to one drive of under 200kb/s, which is terrible. I ran the same tests on a similarly configured 750 and got 400kb/s, which is what I expected. Putting the 750s UDA-50 in the 785 gave it full performance (a little better than the 750; most things were deivce-speed limited). I had DEC give me a new UDA-50 for the 785, but it gets the low throughput! I didn't try the 785s UDA on the 750 yet. I am running a uda driver from daves@riacs. I plan to do further investigation on this one. If anyone has seen problems like these, please let me know. Both DEC Field Service and I are baffled. --bruce
mangler@CIT-VAX.ARPA (System Mangler) (09/13/85)
The UDA-50 has a set of pluggable jumpers that set the "Unibus delay" - the amount of time the UDA waits between DMA requests to give other devices a chance at the bus. This can be set for 0us, 6.7us, or 10us. The burst transfer rate at each setting is: 0us 800 kilobytes/second 6.7us 350 kilobytes/second 10us 250 kilobytes/second Some devices with very little buffering (RK07's and RL02's) will get data- lates if competing with a UDA set to any but the slowest setting, so I guess DEC has been tending to ship them set for 10us. I found that an RL02 sharing a 750 Unibus with a UDA and an Interlan would get data lates on ANY of the three settings. (So we sold both the RL02 *and* the UDA, and bought Eagles). You might also try fiddling with "tunefs -d", which sets the rotational distance between consecutive blocks of files. When set optimally, the cpu asks for the next block just as it comes under the heads. Unfortunately, if you use the default value, the cpu asks for the block just AFTER it passes under the heads. When we still had UDA-50's on our 750's I found that the optimal value was a whole revolution - which you specify as 0, to avoid forcing a track switch and the consequent quarter revolution of head-switching delay that is built into the sector numbering. (E.g, track 1 sector 0 is 1/4 revolution away from track 0 sector 0). Of course if you really need speed you don't use RA81's... Don Speck speck@cit-vax.arpa