chris@umcp-cs.UUCP (Chris Torek) (04/10/86)
We bought an Emulex SC41/MS controller and two CDC 9771 drives. The SC41/MS is an `MSCP compatible' device that emulates a DEC UDA50. This article is an anecdotal description of our experiences thus far with the controller and drives. When we first obtained the hardware several months ago, we ran into a few snags. The University bureacracy had managed to mangle the order into listing the machine for which the controller was purchased as a Vax 11/780; in fact, it was a 750, and we needed the Emulex cassettes to format the drives, but of course because we said `780' they sent us console floppies. As it turns out, you can fix this with `arff': The following procedure copies a floppy to tape: # log in as root on 780, # and insert floppy #1 cd /tmp; mkdir floppy1 (cd floppy1; arff x) # extract rcp -r floppy1 750:/tmp # copy to 750 # repeat for each floppy # log in as root on 750 cd /tmp/floppy1 # go to floppy directory arff crmf /dev/tu0 * # put everything on the tape Of course, you still need a boot block on a bootable tape; but I kludged around that by putting the bootable image from the first floppy on our root file system, and making a companion to /boot that loaded it. You could also just copy the boot block from any other DECtape that has it: dd if=/dev/tu0 of=bootblock # with good tape dd if=bootblock of=/dev/tu0 # with new tape In any case, the formatter worked fine, and after about eight hours, both drives were formatted and verified. (I should mention here that Emulex did indeed send us the proper tapes; I was simply impatient.) Now we had nearly 1.35 gigabytes more space on our machine. Wonderful! Now to put it to use . . . so I created file systems and mounted them, and then the fun began. After about five minutes, the machine hung. It was clearly a bug in---what else?---the UDA50 driver, as interrupts were still working and CPU bound tasks kept going. But the moment anything tried to touch a drive, CDC or DEC RA81, it was blocked. This was quite repeatable: with the CDC drives mounted and in use, the machine would hang within thirty minutes. `Well,' thought I, `time to fix the driver.' Now at the time, we were running a modified version of the RIACS driver. For those of you who have not heard of it, this is the one with dynamic bad block revectoring, so that when your RA81 begins to bobble bits, you need not reformat the entire drive, with the attendant and painful dump-and-restore sequence. The key words describing this driver are `useful', `large', and `thoroughly unreadable'. After a few days I gave up the task of fixing the existing driver. It was long overdue for a rewrite anyway; and I decided that I should, instead of just fixing it, try my hand at writing a generic MSCP driver, so that if and when we got a TMSCP tape, it would then be a simple task to talk to it. So of course the next step was the first required when writing any driver: obtain the hardware documentation. `No problem!' thought I. `I shall call DECDirect and give them the order number straight from the Emulex manual.' That I did, and this I discovered: DEC does not sell the MSCP documentation. Yes indeed, it does exist; no you cannot get it. Well, that stumped me for a while. How can you write a driver without knowing what it needs to do? Ah, but wait! We already have a driver---nay, in fact, *three* drivers---that probably do mostly the right things. To make a long story short, I cannibalised parts of the RIACS driver, the original 4.2 driver, and the 4.3 beta driver, to put together a completely redone version of my own. Along the way I found out what all the CPU-dependent code was for, and I changed the Unibus support code to do BDP allocation `right'. It took several weeks, but at last I had a driver that booted and ran. (It took several more days before it crashed properly---a bug in the dump code---and it was still more later that it handled Unibus resets, but it ran!) I brought the CDC drives on line, and waited for the driver to hang. 5 minutes . . . 15 . . . an hour, more . . . *hooray! It runs!* Well, at last all our troubles were over. Right? Wrong. A few nights later I went to dump the new file systems from the CDC drives to tape. We use a special kernel hack to make dump run fast, so there I was loading tapes onto the TU80 and watching them stream at 100 ips. Well, make that about 75 ips average. Performance was not teriffic; but that must be expected with Unibus disk drives, for the fastest transfer rate achievable on a `real' Unibus is 550K/sec, and of course we had seek delays to deal with as well. (Incidentally, for those to whom seek time is important, the CDC drives list an average seek time of 18 ms., and no head switch delay; compare this with, I think, 31 ms. and a 6 ms. head switch delay on the RA81.) Running iostat showed that the top performance of the CDC drives was actually lower than that of the RA81s: doing large raw disk reads, peak performance on the CDC drives was about 350K/sec, while on RA81s it reaches the 550K/sec maximum. Presumably Emulex has not properly laid out the sectors rotationally; and there is no way to change the sectoring: It is in firmware on the controller. Perhaps Emulex will read this and put in a format parameter in the next version. ---But so what if the performance was worse; we needed the disk space. At least it worked. Or so I thought. `DUMP: NEEDS ATTENTION: ...' Time to change tapes again. Ok, tape number 5, go. Watch the reels: ZOOOOOM forward, blip back, ZOOOOOM, blip, ZOOOOM, blip, blap. Blap? Hey, what gives? The tape drive has stopped. Uh, Oh. Wait, no console response; must be hung at interrupt level. Time to get another crash dump. Type control P. I said control P. *Control P*. Oboy. Look at the console lights. POWER on, ok. RUN on, ok. ERROR off, ... off? I quote from the DEC hardware handbook: ------------------------------------------------------------- *Error* indicator Lighted red brightly to indicate that the CPU is stopped because of an unrecoverable, control-store parity error. Because console commands are ignored, the *reset* switch mustbe used to clear the error. Lighted red dimly to indicate that the CPU is functioning normally. ------------------------------------------------------------- Do *you* see any mention of `off'? Well, to make another long story short, the machine would hang quite thoroughy as long as the Emulex controller and the TU80 controller were on the same Unibus. We moved the TU80 to another Unibus adapter, so that now the SC41/MS was all by itself on UBA zero, and the hangs stopped. (No software changes, of course.) Also interesting was the fact that with the CPU cabinet open, the performance of the Emulex card changed. It ran faster. With the cabinet closed it would sometimes slow down so much that the TU80 dropped to 25 ips streaming. (This makes an enormous difference in dump times for one CDC drive, from about two hours for a 330-megabytes-used file system up to about six hours.) With the TU80 on the other Unibus, that problem went away too. Since then (it has been about a week) we have had exactly one crash, this time due to a response packet from the Emulex controller containing the wrong command reference number. It should have said `8009fec8'; but it said `80090000', so all is still not well. Yet it only happened once; it could be a kernel bug; we have installed the kernel RFS from Todd Brunhoff, and we know of at least one bug, so there may well be others. Summary: The controller seems to work, as long as it is on a Unibus by itself, or at least as long as it does not have to compete greatly with another controller for Unibus resources. But you may want to avoid this particular controller, at least until it has been exercised a bit longer. The drives, on the other hand, are very nice. It is wonderful to run `df' and see /usr only 64% full, with another 188 megabytes there alone, and more than 300 megabytes free on the other drive. It is too early to guess at reliability, but there were no bad sectors at all on one of the two drives! -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415) UUCP: seismo!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@mimsy.umd.edu