Geoffm@AFSC-HQ.ARPA.UUCP (01/29/87)
Does anyknow how or if a VAX clusancing between nodes? We are not using terminal servers! Also does a cluster actually share the memory on different nodes and if so how? What about load balancing of jobs put in a batch queue or print queue? Thanks, geoff -------
SYSTEM@CRNLNS.BITNET.UUCP (02/02/87)
Geoff, There is no automatic load-balancing between cluster members. Once a job starts on a particular cpu, it stays on that cpu. All main memory is private. The cluster members are only connected to one another by high speed (70 megabits per second per cable pair) serial communications hardware. If you want real load balancing with shared memory then you have to buy a "tightly-coupled" multiprocessor. The dual processor systems that DEC currently sells are the VAX 8300 and 8800. (The VAX-11/782 is no longer actively marketed.) Terminal servers only provide a very coarse level of balancing, in that they can be used to provide a default login to the system that is the least "busy" at that time. DEC's measure of "busyness" is not necessarily one that a user would agree with. Batch and print "load balancing" is done by the system manager starting a Generic batch queue that everyone submits jobs to, and it will feed jobs to any corresponding cpu specific queue. For example, the following commands define system specific batch queues on systems LNS61 and LNS62, then start a generic batch queue for people to submit jobs to. Nothing keeps anyone from submitting jobs to system specific queues. If both cpu specific queues are idle at the time a job is submitted to the generic queue, then the job will always start on the queue which has the name that comes first "alphabetically": on 5MIN_A62 in this example. $ write sys$output "Starting 5 minute Batch queues" $! $ INITIALIZE/QUEUE/BATCH/ENABLE_GENERIC/ON=LNS62::/START- /PROTECTION=(S:E,G:R,O:D,W:RW)/JOB_LIM=1/BASE=4- /WSDEFAULT=100/WSQUOTA=500/WSEXTENT=600- /CPUDEF=0:05:00/CPUMAX=0:05:00 - 5MIN_A62 $! $ INITIALIZE/QUEUE/BATCH/ENABLE_GENERIC/ON=LNS61::/START- /PROTECTION=(S:E,G:R,O:D,W:RW)/JOB_LIM=1/BASE=4- /WSDEFAULT=100/WSQUOTA=500/WSEXTENT=600- /CPUDEF=0:05:00/CPUMAX=0:05:00 - 5MIN_B61 $! $ INITIALIZE/QUE/BATCH/GENERIC=(5MIN_a62,5MIN_b61)/START- /PROTECTION=(S:E,G:R,O:D,W:RW) - 5MIN I hope this helps. Selden E. Ball, Jr. Cornell University NYNEX: 1-607-255-0688 Laboratory of Nuclear Studies BITNET: SYSTEM@CRNLNS Wilson Synchrotron Lab ARPA: SYSTEM%CRNLNS.BITNET@WISCVM.WISC.EDU Judd Falls & Dryden Road PHYSnet/HEPnet/SPAN: Ithaca, NY, USA 14853 LNS61::SYSTEM = 44283::SYSTEM (node 43.251)
DHASKIN@CLARKU.BITNET.UUCP (02/03/87)
Geoff Mulligan (USAFA) <GEOFFM@AFSC-HQ.ARPA> asks: > Does anyknow how or if a VAX clusancing between > nodes? We are not using terminal servers! Also does a cluster actually > share the memory on different nodes and if so how? What about load > balancing of jobs put in a batch queue or print queue? No, a cluster does not inherently load balance, but does allow for it to be done. First of all, terminal server load balancing is independent of the fact that you have a cluster, although I suppose the fact that a cluster can have a single network alias might allow something like that. No, a clustered machines do *not* share memory. This is an important distinction, for several reasons. Depending on your primary application(s) this may be a drawback or not. It is interesting to note that DEC decsribes the new 8974 and 8978 as 'robust' (they are rather pricy prepackaged 4- and 8-node 8700 clusters, respectively), but they can not be thought of as 'fault-tolerant' (see Digital Review for more). That is, if you're on one node and it goes down, you can then log in to another node, but you have lost your previous process. You can load-balance to some extent using generic batch queues and the cluster-wide queue manager, but it is not 'true' load-balancing... the queue manager will select the execution queue to minimize the executing job versus job limit ratio for all associated execution queues. By setting job limits creatively one could indicate to the queue manager which queues should be preferred. We run a cluster of an 8500 (20 Mb) and 2 750s (8 Mb each), with terminl servers all over the (^&&^* place, and I find that the 8500 has to get *pretty* loaded (50-60+ users) before the servers start finally putting folks on the 750s. Interactive jobs are not inherently load balanced, but you could certainly organize some procedure to do your own (that is, reject certain users on certain nodes at certain times, allow them otherwise, etc, or do a SET HOST for them -- although I've heard arguments that SET HOST is to be avoided because of the overhead). I'm sure other folks out there have done it; it really hasn;'t been a problem for us. Denis W. Haskin ------------------------------------------------------------------------ DHASKIN@CLARKU.BITNET Office of Information Systems (617)793-7193 Clark University 950 Main Street Worcester MA 01610 "Anyone who _moves_ before Most Holy comes back out will spend the rest of eternity sipping lava through an iron straw." - Cerebus
MHJohnson@HI-MULTICS.ARPA.UUCP (02/03/87)
About cluster queues... it appears to me that batch and print jobs will go to the first queue associated with a generic queue unless it is busy. Then the second queue (and so on) will be used. This at least holds true for the print queues. On my system, the one used first has a name starting with G and the second starts with D. Batch queues area little more complicated though. If I remember correctly, VMS will choose the `most lightly loaded' queue in preference to the rule above. This means you can make a generic batch queue and make one queue for each CPU for batch that varies in size based upon the performance of the CPU. For example, if you have a 5 slot queue on an 8700 and a 2 slot queue on an 8200, the jobs will be allocated in the following order: 8700, 8200, 8700, 8700, 8200, 8700, 8700. The second job always goes on the smaller queue (zero load is less than any load...) if the larger queue has any jobs running. It would be nice to have load balancing in a more general way but from I understand about VMS, it would take a LOT to implement. A number of things such as mailboxes, shared memory, etc just don't work between cluster members at this time. --Mark <MHJohnson @ HI-MULTICS.ARPA>
rcb@mcnc.org@rti-sel.UUCP (02/04/87)
I know a cluster does not currently load balance, but consider the following (Especially any DEC systems types out there) I cound have an interactive process running on cluster node 1. A monitor process detects that node 1 is much more heavily loaded than ndoe 2. It decides to swap my process on on node 1. It then reprograms the LAT to make my terminal talk to node 2 and swaps me in on node 2. Voila! Quick and dirty load balancing. Random (Randy Buckland) Research Triangle Institute ...!mcnc!rti-sel!rcb