D. Allen [CGL]" <idallen@watcgl.waterloo.edu> (08/23/90)
Is what I'm doing here safe? I have a pair of DS5400 machines, each with a pair of RA90's and a pair of controllers. They connect as expected: I mount cabinet 0 disk 0 port A on controller 0 of cpu0 I mount cabinet 0 disk 1 port A on controller 1 of cpu0 I mount cabinet 1 disk 0 port A on controller 0 of cpu1 I mount cabinet 1 disk 1 port A on controller 1 of cpu1 That's the main stuff; each machine is connected to its to local disks. Now for the cross-mount in case of failure of one of the cpu's: I mount cabinet 0 disk 0 port B as the second disk on controller 1 of cpu1 I mount cabinet 0 disk 1 port B as the second disk on controller 0 of cpu1 I mount cabinet 1 disk 0 port B as the second disk on controller 1 of cpu0 I mount cabinet 1 disk 1 port B as the second disk on controller 0 of cpu0 Thus, each disk is connected to each cpu; each controller has one disk named "0" and one disk named "1". If I leave all the RA90 "AB" switches both enabled, booting either 5400 alone lets it find all four disks, which are configured as ra0, ra1, ra2, ra3. Once one 5400 has found all four disks, it mounts the local two in its cabinet. If I then boot the second 5400, it only finds the remaining two unmounted disks (which are in its cabinet), but not the two disks mounted by the first 5400. If I then shut down the first 5400, the second 5400 suddenly (without even a reboot) finds the two external disks and prints a message to that effect on its console. Booting the first 5400 again, it only finds its two local disks (because the second 5400 has its local two mounted). My question is: is giving each 5400 access simultaneously to a disk going to cause problems, even though I don't actually mount any disk on more than one CPU? That is, should I be running with both AB switches enabled, or should I really only enable access to one cpu at a time? Having both AB enabled is a great convenience, since I don't have to be physically at the machine to push buttons and switch disks over if one cpu fails; I just mount the other disks. But is letting both kernels know about the disks at the same time safe? -- -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada
alan@shodha.dec.com ( Alan's Home for Wayward Notes File.) (08/23/90)
In article <1990Aug23.021922.13346@watcgl.waterloo.edu>, idallen@watcgl.waterloo.edu (Ian! D. Allen [CGL]) writes: } Is what I'm doing here safe? } } [ A long description of how two RA90s are dual ported between } between two different ULTRIX systems. ] } } My question is: is giving each 5400 access simultaneously to a disk going } to cause problems, even though I don't actually mount any disk on more } than one CPU? That is, should I be running with both AB switches } enabled, or should I really only enable access to one cpu at a time? First, a standard semi-official comment. This is probably an untested and therefore unsupported configuration. If it breaks or doesn't work as expected don't be surprised when a DEC support person say, "Sorry, not our problem". Now for a more useful answer. One of the nice things about the RA series disk is that the A and B ports appear to be mutually exclusive. If you have a disk mounted through one you CAN'T get at from the other. I say "appear" because I can't quote chapter and verse from some specification that this is the way it will ALWAYS work. It's probably a feature of the hardware electronics, but it may be possible for it to break. If you pay a great deal of attention to which system has a disk mounted and only try touching it when the other doesn't see it, then you'll probably be safe. There will come a point when both systems will be able to see the drives, but neither has it mounted. For example: System A crashes at some obnoxious hour of the morning and upon releasing the port (probably via a controller timeout) system B sees the drives. Nobody is around to do the manual failover procedure though and in a few minutes A reboots and sees the drives... Now have both systems able to access the drives. System A should via it's reboot fsck the file systems (assuming you have file systems on them) and will remount them when it finishes coming up. There is though a period of time when both systems have equal access to the drives without either having them mounted. If the procedure is manual and the operators knows what to expect when part of it fails, then you might not have too much of a problem. I haven't had the opportunity to spend a lot of time which such a configuration so I don't know all the problems that might occur. My personal inclination is that without a lot of testing (at each new release of ULTRIX) I'd ensure access from only one port at a time, by using the port buttons. } } Having both AB enabled is a great convenience, since I don't have to } be physically at the machine to push buttons and switch disks over if one } cpu fails; I just mount the other disks. But is letting both kernels } know about the disks at the same time safe? "How safe" is the real question? Having two systems that can get to a disk doubles the chance that something can go wrong. What if the hardware that makes A and B exclusive breaks at the same time one of the controllers also breaks and starts writting random bits? Not very likely, but it could happen. Actually you don't even have to have the disk break. If both controllers break at about the same, one allows the disk to go offline letting the other have access to it you get the same result. Have I been negitive enough? I suspect it's probably safe enough, compared to other supported configurations. I have two systems with access to a common HSC. There is very little to prevent me from accidently mounting a file system on a disk while the other already has it mounted. V4.0 has hooks in radisk(8) for making this safer. } -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu } [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada -- Alan Rollow alan@nabeth.enet.dec.com
D. Allen [CGL]) (08/26/90)
> finishes coming up. There is though a period of time when both > systems have equal access to the drives without either having > them mounted. [...] > "How safe" is the real question? Having two systems that can get > to a disk doubles the chance that something can go wrong. What > if the hardware that makes A and B exclusive breaks at the same > time one of the controllers also breaks and starts writting random > bits? Not very likely, but it could happen. Actually you don't > even have to have the disk break. If both controllers break at > about the same, one allows the disk to go offline letting the other > have access to it you get the same result. Yes, one 5400 usually ends up knowing about its own two disks and the two on the second machine (which it sees when the second machine goes down or reboots). The second 5400 only knows about its own two disks (because the first has its own disks mounted, preventing the second machine from even finding them). Since even when a kernel knows about all four disks, it only mounts its local two, I think I'm pretty safe from having the same disk mounted on two machines simultaneously by mistake. I've had some mysterious "hang" situations with the machines. I'm running them now with only the A ports selected, to see if the hangs recur. I have the nasty feeling that having two kernels recognize the same disk is causing problems, even if only one system actually mounts the disk. Perhaps there are things that kernels do even to unmounted disks that would interfere with those disks while they are being mounted and used by another kernel. I can imagine that when a kernel goes to find out if a disk is there, it might do something that would interfere with the concurrent use of that disk by another kernel. Or, a disk might generate some message or interrupt to the kernel that would end up being fielded by *both* kernels, and funny things might happen. Oh well. It would have been so convenient. I'll try A+B mode again in a few days. -- -IAN! (Ian! D. Allen) idallen@watcgl.uwaterloo.ca idallen@watcgl.waterloo.edu [129.97.128.64] Computer Graphics Lab/University of Waterloo/Ontario/Canada