ben@cunixf.cc.columbia.edu (Ben Fried) (01/13/90)
I have a small TCF cluster composed of a 9370 and several PS/2s, all running 1.2 F50. The 9370 is the primary site, and the PS/2s are all secondary sites. There is no backbone site. I need to make one of the PS/2s, now a secondary site, become the primary site for the cluster. It would be nice to make the 9370 into a backbone site while I'm at it, but that's not essential. It is absolutely essential that I make the PS/2 into the primary site. I've gotten minimal help from IBM - apparently, this is not a documented process, and the one person who has done something like this before says it would take two solid days on the phone, and possibly weeks to correct a mistake made somewhere along the way. Has anyone else done this? Am I really unique in wanting to rearrange my cluster topology in this fashion? Ben -- Benjamin Fried ben@cunixf.cc.columbia.edu rutgers!columbia!ben
jackv@turnkey.gryphon.COM (Jack F. Vogel) (01/15/90)
In article <1990Jan12.184310.25876@cunixf.cc.columbia.edu> ben@cunixf.cc.columbia.edu (Ben Fried) writes: >I have a small TCF cluster composed of a 9370 and several PS/2s, all >running 1.2 F50. The 9370 is the primary site, and the PS/2s are all >secondary sites. There is no backbone site. >I need to make one of the PS/2s, now a secondary site, become the >primary site for the cluster. It would be nice to make the 9370 into a >backbone site while I'm at it, but that's not essential. It is >absolutely essential that I make the PS/2 into the primary site. My my, the times they be achangin', actually seeing this kind of TCF stuff on the net :-} :-}! OK Ben, here is what you need to do, and up front remember that no warranty is implied, do this at your own risk, and may the force be with you :-}!! First off, BACK UP YOUR SYSTEM COMPLETELY!! Maybe a DDR of all the system minidisks would be a good idea, but whatever because if you screw up in this procedure you could thoroughly zonk the system. Second, you cannot convert a secondary into a primary because it does not store all files of the replicated root and it needs to. So, what you need to do first is to make the PS/2's root into a backbone copy instead of a secondary copy. I hope you have a couple of large disks on it :-}! You will need to see how large the root filesystem of the primary is and make one of equal size on the PS/2. Depending on how you installed this PS/2 in the first place this may require you completely reinstall it making a root minidisk with enough space. If you actually have enough space in the mini- disk there is a simpler way, you simply come up dependent (generic) on the cluster, then remake the root filesystem with the correct number of blocks and inodes to match the primary, use the command: mkfs -r /dev/root blocks:inodes Be sure the blocks are sufficient, and that the inodes are exactly equal to the primary. Also make sure you include the -r flag, which says to make this a replicated filesystem. It will ask you a number of questions, be sure to specify that this is a backbone filesystem. After the filesystem is made mount it on / with the command: mount /dev/root / Once it is mounted you will want to repropagate the contents of the replicated root, assuming the primary is site 1 and you are up as site 31, give the command: /etc/primrec 1 31 1 This will take a while, but when it finishes the PS/2 will have a backbone copy of the root. Reboot it and come up as an independent member of the cluster once just to sync things up. Third, now comes the tricky part :-}! The only difference between a primary and backbone copy of the root is a bit setting in the flags of the superblock. The tricky part is keeping the integrity of your cluster through the transition between the primary sites. Here is the sequence I would follow: First, make sure all copies of the replicated root across the cluster are in sync. You can tell this by issuing the command 'rdf /' and observe that all the low and high water marks are identical. Then starting with the primary take all sites down to single user mode. Next go to the PS/2 that you want to make the primary, as root logged in on the console in single user mode do the following: fsdb /dev/root It will display a number of lines of information, then type in 'S', this will display superblock parameters, the one you care about is the flags, it should say: flag: 0xa0 The bits in this unsigned short are defined in /usr/include/sys/filsys.h but this value indicates a replicated backbone filesystem, we want to change it to a primary replicated filesystem. Do this with the command: P=0x30 <enter> (notice the lack of spaces, this is important Then type in S again to redisplay the parameters and verify that the change has been made. You should also have an fstore value of all. Now type q to quit. Next you will need to edit the stanza for the root in /etc/filesystems. You need to change the line 'type = repl' to read 'type = repl,primary'. Finally, sync a couple of times and reboot. Now hold your breath, keep your fingers crossed and hope like hell it comes up as a primary site :-}! You will be able to tell if when you come up your root is read/write and not read-only. Come up multiuser, and now you will want to bring all other sites back up EXCEPT THE OLD PRIMARY, they should come back up and join the cluster with your new primary. The last step is to modify your 9370 to be a backbone instead of a primary. To do this you basically do the opposite of what you did for the PS/2. Run fsdb on /dev/root and alter the flag from 0x30 to 0xa0, edit filesystems to remove the primary flag from the root stanza, then reboot and bring it back up on the cluster, it should now be a backbone site. Notice one side effect of this is that the primary is no longer site 1, this shouldn't matter but it does alter the parameters of some commands. Keep in mind, this is all described from theory, I personally have not done this, although I believe it has been done here at LCC on occassion for some special development. So, there you have it in all its gory details :-}. As you can see, this procedure is not for the faint of heart nor for those of fumbling fingers :-} :-}! Proceed at your own risk and with great caution for, as I said, this is not a supported practice. But if you want we might set up a conference call between you, IBM Level 2 support, and myself (they know how to contact me at LCC). Disclaimer: This TCF trivia is my responsibility, not IBM's or LCC's -- Jack F. Vogel jackv@seas.ucla.edu AIX Technical Support - or - Locus Computing Corp. jackv@ifs.umich.edu