[comp.sys.sun] YP fault tolerance.....

fmbutt@uunet.uu.net (Farooq Butt) (03/25/90)

Here's a question for you.   We run two servers here on the same YP
domain.  Now one server is a slave of the other.  We did this to prevent
downtime for all the YP clients that the master server serves.  Over the
past few weeks I have been getting quite a few complaints about YP service
when our primary (master) server dies.  At first I dismissed most of these
as glitches but after a terrifying experiment yesterday, I am seriously
concerned about YP's ability to switch to a slave server after the master
goes bye-bye.  

Questions:

1. Is a YP slave server supposed to take over the duties of a master YP
   server?? This is what I had always thought but recent experience seems to
   prove otherwise....

2. If a slave does take over, how long after the primary goes away does
   the slave realize that it better step into the ring??  THIS IS OUR BIGGEST
   CONCERN. If YP cannot handle a switchover in "real" time (i.e < 4 minutes)
   it is useless to us.

3. What should I be doing to ensure disaster tolerance as far as YP is
   concerned? My goal is to have a system in which users who don't access the
   primary YP server would never know whether it even went away. 

Farooq
fmbutt@stratus.com

barmar@think.com (Barry Margolin) (03/26/90)

The original poster asked about YP slave servers "taking over" when the
master server dies.  YP servers don't keep track of each other at all.  It
is the clients' responsibility to do this.  As far as clients are
concerned there are no differences between master and slave servers.  When
a YP request times out the client sends out a broadcast looking for a new
YP server.  The first server to respond (which can be either a master or a
slave) is then used for future queries.

Since broadcasts only work within a single subnet, the most important
consideration for YP fault tolerance is that there must be at least two YP
servers on each subnet at sites that have multiple subnets.  If there's
only one server on your subnet and it goes down you'll never be able to
find an alternate server automatically (but ypset can be used to manually
bind to a server on another subnet if you have done "ypbind -ypset").

Barry Margolin, Thinking Machines Corp.
barmar@think.com
{uunet,harvard}!think!barmar