[mod.computers.vax] new cluster suggestions...

BRENT@uwovax.UWO.CDN.UUCP (01/14/87)

   It seems likely that our site will be acquiring a second vax in the
next few months.  We currently have an 8600 with an HSC, a tri-pack
of RA-81s (1 system, 2 in a public structure), and one tape drive.
The new system will probably add a quad pack of ra-81s, one more tape
drive, and the processor will likely be an 8550.
   If this goes ahead, there will be strong political forces at work
trying to segregate Computer Science students from researchers.  (Other
catagories of users will fall onto one machine or the other.)  Such an
arrangement of course defeats a big advantage of a cluster.  Yet there
are brief periods (eg end of term) when CS students can severely impact
research, unless some segregation takes place.
   I would like to use VMS in some way to "softly" segregate these
user populations.  So that segregation can be turned on and off as
necessary.  45 weeks of the year the cluster would be available to
all in an integrated manner, but when the crunch or complaints comes,
all future logins get segregated.
   My question to all you system managers out there is, how do I accomplish
this in a flexible manner.  Now is the time for planning, not when the
machine arrives.  Have any of you done this sort of thing before?  How
successful were you?
   The other question pertains to disk (structures).  2 RA-81 drives
bound together works very well, but more than 2 make me nervous.  I'm
seeking suggestions on managing the additional 4 disks.  One thought
which comes to mind is to put researchers on another 2 pack structure,
the rest of the academic centre on a single disk, and to keep one (in
the short term at least) for me personally (I am the system manager after
all :-).  Actually I could migrate that disk where needed and as needed,
with me moving wherever space is available.
   Note that at this site, shadowing is not a serious issue.  It's nice,
but we just don't require that degree of availability (or performance).
   Thanks for any thoughts you may have on the subject.  Brent.
--
Brent Sterner
Lord Protector,  d i g i t a l  Systems
Computing & Communications Services
Natural Sciences Building
The University of Western Ontario
London, Ontario, Canada  N6A 5B7
Telephone (519)661-2151 x6036
Network     <BRENT@uwovax.UWO.CDN>  ! VAX 8600
            <A105@UWOCC1.BITNET>    ! IBM 4341

LEICHTER-JERRY@YALE.ARPA.UUCP (01/16/87)

       It seems likely that our site will be acquiring a second vax in the
    next few months....
       I would like to use VMS in some way to "softly" segregate these
    user populations.  So that segregation can be turned on and off as
    necessary.  45 weeks of the year the cluster would be available to
    all in an integrated manner, but when the crunch or complaints comes,
    all future logins get segregated....

The easiest thing to do is to define a system-wide login file, pointed to via
SYS$SYLOGIN, that when activate - perhaps by the setting of some system-wide
logical like SYS_SEGREGATE - checks to see if the account it is running in is
allowed access to the machine it is on.  If not, it complains and logs out.
From DCL, you can easily access an indexed file, indexed by username, that
indicates which machines each user may can access.

You'll have to be careful about batch and perhaps network jobs.  If you have
a queue that feeds both systems, users will have no control over which one
their job gets started up on.  During the "peak" periods, you could set the
common queue to feed only the student system; or you could have the system-
wide login file define a logical name for that queue that feeds to the
appropriate place, depending on the username; or you might decide that batch
jobs should be allowed on either system.

Network jobs would only be a problem if you set up a cluster alias and then
had remote jobs coming in to user accounts via the cluster alias.  Again,
there are many ways to deal with this issue.

Finally, let me note that the use of the SYS$SYLOGIN logical is essential.
Users can over-ride the login command file specified in their UAF record
with the /COMMAND or even /DISK qualifiers at login.  They cannot, however,
over-ride the SYS$SYLOGIN logical.  (Of course, the first thing the file
should do is disable CONTROL/Y.)

Since EVERYONE will ALWAYS have to go through the command file SYS$SYLOGIN
points to, keep it short, simple, and fast - if necessary, put any extensive
(but not absolutely essential) stuff in a system-wide file pointed to by user
SYSUAF entries.  In particular, do NOT have the SYS$SYLOGIN file execute the
user's login file.
							-- Jerry
-------

yerazuws@CSV.RPI.EDU (Crah) (01/16/87)

	It seems that what you think you want to do is to keep the
CS students on one CPU, while maintaining the clusterness of the
assembly (8600 + 8550) in terms of disk sharing via the HSC
	
	Well, one thing *is* important - and that's that you're
using LAT's or DECservers.  Otherwise, you have to move terminal
connections on the backpanel and that's a pain and a half.
	
	Method 1)     As of 4.4, you can have an entire
cluster have a name that a LAT or DECServer knows about; so you
(under normal operations) use the name for the cluster.  You 
have the password file in sys$common, so both machines read it
from there.
	
	When you need to split the system for logins, you disable
the clustername (actually it's unnecessary to do this but ...).
Further, you put *different* SYSUAF.DAT files in the sys$specific
directories of each machine.  Viola- you can allocate on a per-user
basis who can go where.
	
	Method 2: (and preferable in terms of load-balancing.)
Use disk quotas to keep the undergrads from overflowing everything 
(in operation continuously).  Put a mod in SYLOGIN.COM to check the
node (via F$NODE) and if it's the research machine, then to check the
current load (which involves doing a SHO SYS /OUTPUT=filename and
then reading the file with DCL - tricky but doable).  If the load is
too high, then suggest to the poor undergraduate that he should
LAT to the other machine, give him a few seconds to read the
message, and log him out. 

	Faculty, of course, would be "known to" the SYLOGIN command
procedure (either because their group number was different or
by having an attribute in SYSUAF.DAT), and wouldn't be 
subject to this cutoff.
	
	The advantage of this method is that it automatically
switches in or out depending on what the load is.  It frees up BOTH
cpu's automatically in the evening, on weekends, when nobody on
the faculty is doing anything.  
	
	You might want to put a "hysteresis hack" in so that 
short-term decreases in CPU don't start letting undergrads in
(i.e. during lunch).  This is easy to do - have SYLOGIN write a
file (with trash in it) whenever it decides load is too heavy
for undergrads.  Put this file in SYS$COMMON.  After the load
check, look at the date/time on the file.  If it's been less than 
one hour, update the file-last-altered time and refuse the login.
If it's longer, don't update and allow the login.   Hence, if
anyone has been bounced on load average within an hour, you keep
the undergrads off.  You can avoid having billions of versions
of the junk file by specifying name.type;version when the command
procedure opens the file.  
	
	You also can have the command procedure omit the whole 
check whenever it's after 5 PM or on a weekend.
	
	Now, a brief flame about the politics of this whole system.
I think it's a bad idea.  By making the available resources SMALLER
when demand increases, the undergrad CPU will die in agony.  Meanwhile,
every undergrad will know that there's this other CPU virtually unloaded
(even though it isn't, they can't go there and see for themselves).  
This will frustrate them- and so they will go out to try and crack 
the restriction.  And certainly be angry that it exists.
	
	This is something you do *not* want to happen!  In the US,
a third of a college's income comes from donations of alumni - and you
desperately do NOT want to alienate your final-semester seniors, who 
shortly will be wage-earners in a most lucrative field.  T'would
be far better to tell the professors to go take a week's vacation
than to get these newly-minted wage-earners mad at you.  
	
	Trust me - I know a college that did such things for about 
a decade to it's undergraduates- and now has severe financial problems
because none of the last ten years of undergrads want to donate a cent
to the college. (I exaggerate - it's not "none".  It is less than a 
third the national average, however)
	
	Sit and endure and if you need an explanation, tell the 
*management* that although it's "possible" to partition the system,
it would result in "severe transient load balancing problems and
system instability".  The transient load balancing problems is due
to switching a much greater load onto the undergrad CPU (making your
previous tuning there worthless).  Jobs on the undergrad CPU will
most likely thrash - and where will the thrashes impact?  On your
CI and HSC and paging device- which may or may not saturate.  But 
the load on CI/HSC/disk will certainly INCREASE beyond what it would have
been if you didn't partition in the first place.
	
	The system instability is due to hackers trying to break
the partitioning.  If they succeed, you lose.  If they fail, they might
break something important (either crash the system, or maybe just get
frustrated and pour a coca-cola into a keyboard).  That's bad, too.

It's far more kind to never give someone access to something than
to give them something and then take it away - just when they need
it the most.
	
	So, now that you know how to do it, and why you shouldn't
do it, what do you want to do?
	
	-Bill Yerazunis

	"You've got it all wrong.  I'm not locked up in here with
	 you; you're locked up in here with ME ! "