[comp.unix.questions] Neophyte network programmer seeks help!

scott@dtscp1.UUCP (Scott Barman) (05/31/89)
Please excuse the cross-posting, but I am interested in getting
information from the widest audience possible.  Because of the
cross-posting, there is no way I can keep up with posted responses.
Therefore, I am asking that you email me a response and, if there are
enough requests, I will post a summary later (please tell me the
appropriate place).  Thanks.

What I am looking for is a pointer to information that may help me solve
the following problem:

There is a network of machines (currently Suns of various types, but
that can change--but they will more than likely be 4.[23]bsd-based
machines) that can will be running an application.  This application is
based on a set of servers now just running on a single machine.  A client
will connect to a server after asking a name service where to make that
connection (currently, the network is an ethernet-based LAN, but this can
change as well).  This is what is in place now.

What we want to do is (essentially) make it a distributed environment
where the a server can live anywhere on the network and, if a server
dies or a machine crashes, through the power of magic (and some astute
programming), the client doesn't crash but will connect to a "backup"
server that might be active on his/hers or another machine.  The goal is
to make this reconnection transparent to the client (user).

The model that I have come up with so far (from doing alot of reading
and testing of this knowledge) is using a name service on all machines
that will keep up with the configuration of this network (sort of like
the internet gateway system).  Each machine starts up each type of
server available which one (anywhere on the network) is considered the
primary server and the others are backups.  When a client wants to
communicate to a particular server, the client (through an underlying
communications package that allows the client to effectivly say "open a
connection to this server") connects to the correct primary server with
information given by the name service.  If the primary dies or
communications fail, the name service is consulted for a new server to
connect to.  The name service, on the other hand, handles the
negotiations between the backup servers as to which one is the new primary.

There are many issues that have to be considered.  However, some of them
are a little more confusing than others (many because of my lack of
knowlege and experience in this area--which I hope will change).  These
include:

	1) How does the servers sync themselves between the primary and
backups?
	2) How to tell if a server died or there is something wrong with
the "connection" between client and server?
	3) If a primary server dies, then comes back up, and it is
supposed to be the primary (by configuration), do is it allowed to be
the primary again, or is it put in a backup role until the current
primary dies?
	4) For that matter, how does one do configuration?

I am sure someone "out there" has done something similar to this.  All
that I am looking for is a pointer to the information I can use to learn
what I have to know to get this done, hear somebody else's experiences
on what design descisions can have this whole thing fall flat on its
proverbial face, and if the above is a good approach.

Just to give you a background (as in I am not totaly ignorant :-), I
have been a programmer under Unix since our V7 tapes were delivered at
school.  While I have done some kernel hacking (drivers, performance
mods, etc.) and alot of applications programming, I have never had the
opportunity to do much with networking (besides setting up NFS and Yellow
Pages--and I am not even counting uucp :-).  I have Comer's TCP/IP book
and Tannenbaum's Networking book and I know how to get any RFC that I may
need, but I feel there is something else that may assist in enlightening
me on this subject.

Your help and comments are appreciated!

-- 
scott barman
{gatech, emory}!dtscp1!scott