[comp.unix.internals] load sharing

mfg@castle.ed.ac.uk (M Gordon) (02/07/91)

fwp1@CC.MsState.Edu (Frank Peters) writes:

>: On 6 Feb 91 16:22:07 GMT, pjw@usna.navy.mil, , jw@math30, (Peter J. Welcher (math FACULTY)) said:

>pjw> The question is, is there any easy way to perform load-sharing, other than by
>pjw> randomly assigning sections or students to hosts ?  

>I once toyed with an idea to do something like this using DNS but
>never implemented it.

>Basically the idea was to define a new record type in my local DNS
>tables called PROG that would run the given program and return the
>result in an A record to the calling program.

...

>I think this idea has the following advantages:

>1.  I'd be willing to bet that the necessary modifications to bind
>    would be relatively trivial.

>2.  Since all that ever gets returned is an A record no modifications
>    are required to the world wide DNS system or to individual
>    resolver clients.  And no front end host beyond the nameserver
>    would need to be involved...none of this 'telnet to machine A and
>    let it decide where you should go' stuff.

>3.  The actual load program can be upgraded/replaced/modified with no
>    changes to the bind code.  I can make leastload return a random
>    host as a first pass, then the least number of users later, then
>    the least loaded cpu and so on for finer levels of balance.  The
>    two tasks (picking a destination and returning it to the user) are
>    isolated.  I always did like modularity.

>Any comments on this idea?  Any reason why it would be especially
>difficult/impractical? 

>Anyone who has actuall done this?? :-)

I implemented a similar idea for our network of suns.  Named has been
altered to recognise "sun3" and "sun4" as special cases and use RPC to get
a hostname from a server.  There were several reaons for doing it this way,
rather than having named doing the polling itself.

	If a machine is down named would hang until the poll of the 
	dead machine timed out, stopping it responding to other calls.

	As well as the terminal servers using DNS for name lookup we have
	some Bridge terminal servers which use their own name server 
	machines. The primary server for these is set to a Bridge box,
	the secondary to the address of the server.  The primary server 
	will not recognise the name "sun3" so it will be passed to the
	server to reply with an address or "name unknown" if it is not
	a request for "sun3" or "sun4". The same server can respond to
	both RPC requests from named and Bridge boxes.

	We still have some people with serial lines into Vaxes. These 
	lines are running a modified getty. Instead of /bin/login the
	modified getty runs a small program which makes an RPC request
	to the server and execs an rlogin to the machine returned. This
	part of the system will gradually disappear as the Vaxes are 
	retired and we move people onto the terminal servers.

	The server is actually two programs, one which does the polling
	and puts the results into a shared memory segment and the other
	which responds to RPC requests. This means that the response to
	a request is immediate, even if the polling program is waiting
	on a dead machine.  It also makes it possible to use the
	information gathered for other purposes.  e.g. a screen in our
	machine room shows the load average of all our suns and the name
	starts flashing if a machine dies, letting us monitor the state of
	machines all over the building.


Michael
-- 
							 _   _   _    _   _	
Michael Gordon - mfg@castle.ed.ac.uk OR ee.ed.ac.uk	| |_| |_| |__| |_| |   
							| . . . .      . . |    
I spilt spot remover on my dog and now he's gone! 	|_________|~~|_____|    

pcg@cs.aber.ac.uk (Piercarlo Grandi) (02/10/91)

On 6 Feb 91 16:22:07 GMT, pjw@usna.navy.mil, jw@math30, (Peter J.
Welcher (math FACULTY)) said:

pjw> The question is, is there any easy way to perform load-sharing,
pjw> other than by randomly assigning sections or students to hosts ?

fwp1@CC.MsState.Edu (Frank Peters) writes:

fpw1> I once toyed with an idea to do something like this using DNS but
fpw1> never implemented it.

On 7 Feb 91 09:20:19 GMT, mfg@castle.ed.ac.uk (M Gordon) said:

mfg> I implemented a similar idea for our network of suns.  Named has
mfg> been altered to recognise "sun3" and "sun4" as special cases and
mfg> use RPC to get a hostname from a server.  There were several reaons
mfg> for doing it this way, rather than having named doing the polling
mfg> itself.

These are hacks, that may work or something like that. There are
actually two load sharing issues; one is directing logins to specific
hosts, another is directing commands to specific hosts.

The second is more interesting. There are some posted implementations of
shells that will probe various compute servers to find the least loaded
using rstatd and then call the rexecd daemon to remote execute the
comamdn on the least loaded host. One I think was done by George Goble
at Purdue; another by somebody at BTL and is posted in
comp.sources.unix.

A similar tool is extremely easy to write, a couple dozen lines. It is
possible to just use these couple dozen lines as a wrapper around most
large or frequently executed commands, or actually take a shell source
and stick them in in the section that invokes external commands. I think
that the heuristic of choosing the lowest load average host is not one
of the best, but at least is simple; I think that memory size and
whether the host has local or remote access to the files needed by the
command should be taken into account.

Doing a nice system that will execute each command, maybe described in
some profile file, on the most apropriate machine, is a nice research
project.

Directing logins is an easier exercise; usually, if one has remote
command execution as above, a simple rotor (FIFO) style policy will
suffice. Many rlogin boxes will do it automatically.
--
Piercarlo Grandi                   | ARPA: pcg%uk.ac.aber.cs@nsfnet-relay.ac.uk
Dept of CS, UCW Aberystwyth        | UUCP: ...!mcsun!ukc!aber-cs!pcg
Penglais, Aberystwyth SY23 3BZ, UK | INET: pcg@cs.aber.ac.uk