smh@mit-eddie.UUCP (Steven M. Haflich) (12/28/84)
I am considering dual porting RA81 drives between UDA50 controllers on separate machines to support data-intensive applications. Someone with detailed knowledge of these devices and driver could save me days of code-and-documentation reading: How does the UDA50 hardware handle a request on a drive which is busy executing a command from its other port, and would this condition blow away the standard 4.2 driver software? Also, does the hardware support dynamic locking of a drive by one or the other port under *program* (not pushbutton) control? Respond by mail and I'll summarize. But *please*, my question concerns details of hardware capability and driver support. I *don't* need to be told why Unix file systems cannot be mounted shared (unless both machines mount read only), etc. Thanks in advance. Steve Haflich {decvax!genrad,ihnp4}!mit-eddie!smh smh@mit-ems@mit-mc
dave@RIACS.ARPA (01/02/85)
If you are serious about attempting the dual porting of RA81 drives on a the UDA50 it is essential that you obtain a copy of the UDA50 Programmer's Kit which is mentioned in the user documentation you should have received with the controller/drive. I have heard that they are no longer offerred by dec, and we a duece of a time getting one, but I don't see any realistic options for accomplishing the work. What follows is the result of a fairly quick reading of the above documentation, and some legend. A more detailed examination, and some experimentation may blow all of this out of the water. I think the bottom line on such an attempt will be that it just isn't worth the effort on most systems. It looks to me like dual porting via UDA50s is ok for backup operation, but was not designed for high speed dual system access to RA81s. When I looked into the possibility of the dual porting of some of these drives, I was told the the minimum time between successive hosts' access going through separate UDA50s to a single drive was on the order of 600 milliseconds. I can't determine the actual minimum time from the documentation because the documentation which I have seems to require the expiration of a host timeout value not used in 4.2 BSD; there does not appear to be a way for a host to release its controller's access to a dual ported drive, explicitly. There is an indication that controllers must support a host timeout value minimum of ~10 sec, but the actual minimum value implemented for a UDA50 is unknown to me. I did not verify the minimum host switch time experimentally when I had a configuration on which I could have because even if as low as 600 milliseconds, that was too long. Now I am in a one machine environment. As I read the documentation, a drive can be "online" to only one controller at a time, although there are provisions in the mscp protocolfor multiple hosts on a shared controller (i.e., an HSC50 type lash up). A host which attempts to access a drive which is online to another host via another controller is said to receive an end message code which says the drive is "offline" and the subcode indicating that the drive is online to another controller. This amounts to a kind of dynamic lock on the drive resource per controller. All current implementations of the 4.2 BSD UDA50 driver with which I am familiar would look at the completion flags, see !M_EF_SUCC, and assume a hard error; the distributed 4.2 BSD driver would report some fairly cryptic information such as the logical block of the attempted transfer in hexadecimal, and return the error to the process attempting the transfer. The section of the documentation I have does not even contain a section it refers to on multi porting, but it is a preliminary version. Handling the online to another controller end message, all by itself, seems easy. Although I did not examine the changes required to support the multi-porting in the 4.2 BSD driver, I don't believe it would be too difficult to adopt a simple strategy of sleeping, and retrying the operation later. This might not be acceptable to user, or system processes, however. Another strategy might be to report the problem "upstairs" and let the file system code decide what to do about such a circumstance. There may be someone out there who has successfully attacked the dual porting problem, and if so, I would be interested in hearing about it. dave ----------
chris@umcp-cs.UUCP (Chris Torek) (01/04/85)
> Handling the online to another controller end message, all by itself, > seems easy. Although I did not examine the changes required to support > the multi-porting in the 4.2 BSD driver, I don't believe it would be too > difficult to adopt a simple strategy of sleeping, and retrying the > operation later. If by ``sleeping'' you mean calling sleep(chan, pri), then no, this should not be done. The off line error is detected at interrupt level and with some random process (if any) running. Ideally, the controller would give another interrupt when the drive came on-line; you could then set a flag that says ``this drive is at the Bahamas but I'm getting it now'' and simply queue any new operations until it comes back. If you can't get the controller to announce the presence of the desired drive, then you'd have to resort to timeout calls to check it every so often, or something like that. -- (This line accidently left nonblank.) In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (301) 454-7690 UUCP: {seismo,allegra,brl-bmd}!umcp-cs!chris CSNet: chris@umcp-cs ARPA: chris@maryland
David L. Gehrt <dave@RIACS.ARPA> (01/04/85)
When a UDA50 device "unit online" to hosta, no longer is in that condition, it is supposed to notify the controllers, who in turn notify the hosts via some Available Attention Message. The real fly in the ointment seems to me to be, that as I read the stuff, there is *no* way to guarantee that the other host will ever release its grip on the device, and the MSCP protocol provides no way to force the release, or even request it. This would really screw things up if you tried swapping on a shared device (assuming you could overcome the systemic problems associated with shared swapping, or even if the swap partition is not shared). As far as sleeping, I guess I haven't quite earned my wizards wand (spear, spurs, flame thrower ?) drivers do sleep from time to time, I am sure I have seen it. The context switch results in a new cpu status word and things hum along until the wake up condition when the process is awakened at interrupt priority and away it goes. There are of course other mechanisms for delaying the device access under these circumstances so the bottom line is that it is a doable thing, but still of dubious utility. dave ----------
joe@fluke.UUCP (Joe Kelsey) (01/07/85)
There is a point which everyone seems to be missing here. DSA (Digital Storage Architecture) drives CANNOT be "dual-ported". They may be "dual-pathed", but it is physically impossible to "dula-port" them. What you need to consider is that a UDA50 really is not intelligent enough to be able to handle the full MSCP protocol. If you consider what happens in an HSC50 environment, then you can see the problems better. When you have a VAXcluster, centered around the HSC50, the HSC is responsible for coordinating ALL read/write access to all drives, and the VAXen communicate other information about locking files, records, etc., among themselves via the CI. In this environment, it is possible to dual-path the drives attached to a given HSC for physical redundancy, i.e., if one HSC goes down, the other one already has the cables connected to the drives and can pick up the traffic by using the dual-path connection. Thus, even in the HSC environment, only one controller can be the master of the drives once the drive goes on-line. The same thing hold true in the UDA50 case, but in this case, dual-pathing a drive gains you NOTHING, unless you have two UDA50s on your computer just in case one of them fails. In otherwords, if you want to share disks, you have several choices: 1) Wait for the Ultrix people to figure out what they are going to do about clusters, CIs, and HSCs. Then, maybe, you can buy an HSC, star coupler, a bunch of CI780s or CI750, and go to town. 2) Talk to SI about SIMACS - they claim that it runs on UNIX now. 3) Buy Massbus disks and come up with a way of mounting the disk R/W on one system with caching disabled and R/O on the other system. /Joe