[comp.unix.wizards] load sharing

pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) (02/07/91)

I have a question, especially for the academic readers of the group. 
(And it may just be I'm missing the obvious, or re-inventing a wheel.)

The Naval Academy Math Dept has 28 Suns, mostly in faculty offices. We'd like
our students to be able to run Mathematica and Matlab on them by logging in
via PC's running Procomm, connected via Ethernet. We do that already for a
few students, no sweat. I'm worried about handling lots of students, all trying
to do homework the night before it is due.

The question is, is there any easy way to perform load-sharing, other than by
randomly assigning sections or students to hosts ?  
What I think I'd like to do is perhaps tell
students to log into a certain host (say math3) and then have them randomly be
rlogin-ed to another machine before the program (Mathematica, Matlab) is run.
Is there any reason this is a bad idea ? My thought is the rlogin load will be
relatively low, so going thru a common machine won't overload it too badly. (And
it's a SPARC server, 32M memory, with unlimited user license.)

Writing a script that does something with rwho is a possiblity, but there's all
the net overhead to rwhod. (28 machines). I do want something that completes within
say 5 to 10 seconds, so rusers, rup and the like are no good.

I've written a C program that forks (to get around timeout delays) and then
does rstat calls. It is called "loaddist".
It kills processes that don't finish within a short time, and
then prints the name of the least loaded host (with some other fudge factors
thrown into the calculation, like Sun 3 vs. SPARC). 
My idea was to have "rlogin `loaddist`" done to the students when they
log into the specified host, math3. Is this a good/bad idea ?

An alternative would be to set "loaddist" up as a daemon, to reduce the 
possible amount of forks and net traffic. The daemon would, say, fork a
query to one host per second, so that all information would be refreshed
every 30 seconds or so. The student script would use signals to get "loaddist"
to emit a hostname.

Any comments or suggestions would be appreciated.

pww@bnr.ca (Peter Whittaker) (02/07/91)

In article <25860@adm.brl.mil> pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) writes:

(after many deletions...)

>
>I have a question, especially for the academic readers of the group. 
>(And it may just be I'm missing the obvious, or re-inventing a wheel.)
>
>Writing a script that does something with rwho is a possiblity, but there's all
>
>I've written a C program that forks (to get around timeout delays) and then

If you are going to force people to login to a single front-end host, then
why not write an program that keeps track of who has been assigned to each
machine, then assigns each new user to the least-busy machine? (i.e. using 
rlogin, or what have you).  When a user logs out of the assigned machine, 
strike them from the "assigned machine" table.

As long as your users are doing roughly the same amount of work (which should
be true if they are all working on the same assignment) your machines will
be more or less equally loaded.

It's not terribly elegant, but it would work.  If you wanted to double check
the load on a machine before assigning a user to it, query it via UDP (if you
are on a LAN, UDP should be fairly reliable).  If it says it is too busy, 
query the next machine in your "machine assignment" table.

There are more elegant solutions, but this one should be quick to write and
should work passably well.

shelley@infonode.ingr.com (Shelley Wilmoth) (02/07/91)

In article <1991Feb6.173736.11922@bwdls61.bnr.ca> pww@bnr.ca (Peter Whittaker) writes:
>In article <25860@adm.brl.mil> pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) writes:
>
>(after many deletions...)
>
>>
>>I have a question, especially for the academic readers of the group. 
>>(And it may just be I'm missing the obvious, or re-inventing a wheel.)
>>
>>Writing a script that does something with rwho is a possiblity, but there's all
>>
>>I've written a C program that forks (to get around timeout delays) and then
>
>If you are going to force people to login to a single front-end host, then
>why not write an program that keeps track of who has been assigned to each
>machine, then assigns each new user to the least-busy machine? (i.e. using 
>rlogin, or what have you).  When a user logs out of the assigned machine, 
>strike them from the "assigned machine" table.
>
>As long as your users are doing roughly the same amount of work (which should
>be true if they are all working on the same assignment) your machines will
>be more or less equally loaded.
>
>It's not terribly elegant, but it would work.  If you wanted to double check
>the load on a machine before assigning a user to it, query it via UDP (if you
>are on a LAN, UDP should be fairly reliable).  If it says it is too busy, 
>query the next machine in your "machine assignment" table.
>
>There are more elegant solutions, but this one should be quick to write and
>should work passably well.

If users will not always be working on the same system each time they
log in, and if they will be saving their work in files, you will want
to be sure they have access to those files no matter what machine they
log on to.  For example, user Jones logs on and is assigned to System A,
does some work on the assignment, saves it to a file system local to
System A, then logs off and goes to class.  Later, he logs in again to
complete his work, but is assigned to System B because it has the lesser
load at the time.  He will wish to somehow access the files he stored on
System A.

fwp1@CC.MsState.Edu (Frank Peters) (02/07/91)

: On 6 Feb 91 16:22:07 GMT, pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) said:

pjw> The question is, is there any easy way to perform load-sharing, other than by
pjw> randomly assigning sections or students to hosts ?  

I once toyed with an idea to do something like this using DNS but
never implemented it.

Basically the idea was to define a new record type in my local DNS
tables called PROG that would run the given program and return the
result in an A record to the calling program.

For instance, suppose I had a bunch of suns that were effectively
identical as far as mathematica is concerned.  I might define the
following in my DNS:

$ORIGIN wherever
MathSuns           PROG              /usr/local/adm/leastload
                   MX        10      My.Mail.Hub.Here.Edu.

And any A record requests for MathSuns would run the program, take the
IP address that results and returns it.  By passing the hostname to be
resolved as an argument I could use the same program to manage several
pools.

I think this idea has the following advantages:

1.  I'd be willing to bet that the necessary modifications to bind
    would be relatively trivial.

2.  Since all that ever gets returned is an A record no modifications
    are required to the world wide DNS system or to individual
    resolver clients.  And no front end host beyond the nameserver
    would need to be involved...none of this 'telnet to machine A and
    let it decide where you should go' stuff.

3.  The actual load program can be upgraded/replaced/modified with no
    changes to the bind code.  I can make leastload return a random
    host as a first pass, then the least number of users later, then
    the least loaded cpu and so on for finer levels of balance.  The
    two tasks (picking a destination and returning it to the user) are
    isolated.  I always did like modularity.

Any comments on this idea?  Any reason why it would be especially
difficult/impractical? 

Anyone who has actuall done this?? :-)

FWP
--
Frank Peters   Internet:  fwp1@CC.MsState.Edu         Bitnet:  FWP1@MsState
               Phone:     (601)325-2942               FAX:     (601)325-8921

rbj@uunet.UU.NET (Root Boy Jim) (02/20/91)

In article <25860@adm.brl.mil> pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) writes:
>
>The question is, is there any easy way to perform load-sharing, other than by
>randomly assigning sections or students to hosts ?  

Someone (I believe it is Apollo) has introduced the concept of a "broker",
to complement the concepts of "clients" and "servers". Brokers locate
the latter for the former when location is immaterial.

>I've written a C program that forks (to get around timeout delays) and then
>does rstat calls. It is called "loaddist".

So far, so good.

>It kills processes that don't finish within a short time, and

Probably a bad idea, unless you have lots of runaway processes.

>then prints the name of the least loaded host (with some other fudge factors
>thrown into the calculation, like Sun 3 vs. SPARC). 
>My idea was to have "rlogin `loaddist`" done to the students when they
>log into the specified host, math3. Is this a good/bad idea ?
>
>An alternative would be to set "loaddist" up as a daemon

Sounds like rwho, now doesn't it?

>Any comments or suggestions would be appreciated.

Do you have NFS? Devote a directory on a common filesystem to
load status monitoring. Some people have fixed rwho so that it
merely writes info to a file in the rwho directory. Thus, the
broadcast rwho traffic turns into NFS traffic all destined for
wherever the real directory resides.
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane