scott@dtscp1.UUCP (Scott Barman) (05/31/89)
Please excuse the cross-posting, but I am interested in getting information from the widest audience possible. Because of the cross-posting, there is no way I can keep up with posted responses. Therefore, I am asking that you email me a response and, if there are enough requests, I will post a summary later (please tell me the appropriate place). Thanks. What I am looking for is a pointer to information that may help me solve the following problem: There is a network of machines (currently Suns of various types, but that can change--but they will more than likely be 4.[23]bsd-based machines) that can will be running an application. This application is based on a set of servers now just running on a single machine. A client will connect to a server after asking a name service where to make that connection (currently, the network is an ethernet-based LAN, but this can change as well). This is what is in place now. What we want to do is (essentially) make it a distributed environment where the a server can live anywhere on the network and, if a server dies or a machine crashes, through the power of magic (and some astute programming), the client doesn't crash but will connect to a "backup" server that might be active on his/hers or another machine. The goal is to make this reconnection transparent to the client (user). The model that I have come up with so far (from doing alot of reading and testing of this knowledge) is using a name service on all machines that will keep up with the configuration of this network (sort of like the internet gateway system). Each machine starts up each type of server available which one (anywhere on the network) is considered the primary server and the others are backups. When a client wants to communicate to a particular server, the client (through an underlying communications package that allows the client to effectivly say "open a connection to this server") connects to the correct primary server with information given by the name service. If the primary dies or communications fail, the name service is consulted for a new server to connect to. The name service, on the other hand, handles the negotiations between the backup servers as to which one is the new primary. There are many issues that have to be considered. However, some of them are a little more confusing than others (many because of my lack of knowlege and experience in this area--which I hope will change). These include: 1) How does the servers sync themselves between the primary and backups? 2) How to tell if a server died or there is something wrong with the "connection" between client and server? 3) If a primary server dies, then comes back up, and it is supposed to be the primary (by configuration), do is it allowed to be the primary again, or is it put in a backup role until the current primary dies? 4) For that matter, how does one do configuration? I am sure someone "out there" has done something similar to this. All that I am looking for is a pointer to the information I can use to learn what I have to know to get this done, hear somebody else's experiences on what design descisions can have this whole thing fall flat on its proverbial face, and if the above is a good approach. Just to give you a background (as in I am not totaly ignorant :-), I have been a programmer under Unix since our V7 tapes were delivered at school. While I have done some kernel hacking (drivers, performance mods, etc.) and alot of applications programming, I have never had the opportunity to do much with networking (besides setting up NFS and Yellow Pages--and I am not even counting uucp :-). I have Comer's TCP/IP book and Tannenbaum's Networking book and I know how to get any RFC that I may need, but I feel there is something else that may assist in enlightening me on this subject. Your help and comments are appreciated! -- scott barman {gatech, emory}!dtscp1!scott