[comp.protocols.tcp-ip] Need advice configuring ntpd on an isolated network

gnb@bby.oz (Gregory N. Bond) (09/08/89)

Hi.  I hope this is the appropriate place for this.

I run an isolated network of approx 20 Sun workstations, split into 2
nets joined by a somewhat overloaded 48kbps line.  We are having
great problems with clock sync, as our application area (finance) has
some large realtime element.  We would like all hosts syncronised to
within a few sec and accurate to within say 10 sec.

I have measured the drift of a few handy machines (mainly the various
servers, and the workstations in the systems area) by comparison with
the talking clock on the telephone, and have found 2 that have drifted
by about +2sec/wk over the last few weeks. One of these happened
(luck!) to be my workstation (which is also an nd server and on all
the time), the other a diskless node that is turned off over weekends.
All the others I checked had -ve drifts of seconds per day. None of
the clocks I checked had small -ve drifts. Unfortunatly our main
server is drifting about -5sec/day!

I have ntpd version "89/05/18 Revision: 3.4.1.6" according to the
README file, patched at level 13 (according to patchlevel.h).

I am wondering how I should set up my ntp system, given that I have no
IP links to stratum-1 derived hosts to peer with.  My initial though
is to have the two least-drifting clocks as active peers (stratum 3?
stratum 1?), perhaps with another if I can find one with -ve drift,
and all the others as clients.  The remote server will be a peer and
the remote clients will be clients of the remote peer to reduce net
load (ok, not that it is a lot...)

Given that I have 3 main hosts peered at stratum n, each with a
"natural" drift, how will they drift as a whole? If A = B = + 2sec/wk,
and C = -1sec/day free running, will the combination go at -6sec/wk?
-5? Some other number? (A+B+C)/3?  And how would this be adjusted -
could I use "date -a" on one of the three and have it propagate, or do
I need to do it to all three?

Another alternative is to have one host with known low drift as a
stratum-1 host and peer the other 2 servers at stratum-2.  I can then
use date -a to adjust the server.  Can I somehow inform ntp that this
stratum-1 server has a known drift?

In all this I'm not looking to get millisecond accuracy, indeed drift
of < 5sec/week would be quite acceptable.

Thankyou for your assistance.

Greg.
--
Gregory Bond, Burdett Buckeridge & Young Ltd, Melbourne, Australia
Internet: gnb@melba.bby.oz.au    non-MX: gnb%melba.bby.oz@uunet.uu.net
Uucp: {uunet,pyramid,ubc-cs,ukc,mcvax,prlb2,nttlab...}!munnari!melba.bby.oz!gnb

jxxl@acrux (John Locke) (09/08/89)

In article <GNB.89Sep8140500@baby.bby.oz> gnb@bby.oz (Gregory N. Bond) writes:
>We are having great problems with clock sync...

>I am wondering how I should set up my ntp system...

We had the same problem. We didn't use ntp, though. We used rdate. You define
one machine as a server and run rdate on the rest. Sun advises running it
from /etc/rc.local but there's no reason you couldn't run it from the
crontab once a day. From the man:

SYNOPSIS
     /usr/ucb/rdate hostname

AVAILABILITY
     This program is available with the Networking Tools and Pro-
     grams software installation option.  Refer to Installing the
     SunOS for information on how to install optional software.

Sun Release 4.0   Last change: 17 December 1987

mar@ATHENA.MIT.EDU (09/11/89)

   From: cs!acrux!jxxl@ames.arc.nasa.gov  (John Locke)

   We had the same problem. We didn't use ntp, though. We used rdate. You define
   one machine as a server and run rdate on the rest. Sun advises running it
   from /etc/rc.local but there's no reason you couldn't run it from the
   crontab once a day.

There are good reasons for not running something like this out of
crontab if you have more than a handful of machines.  Since you're
running a clock synchronization protocol, your machines are probably
all within a couple of seconds of each other.  That means that each
machine's cron will attempt to check the time at the same time,
causing a large number of simultaneous requests first to page in the
binary of rdate from the file server, then all of the requests to the
master timeserver.  Things like this can cause massive collisions on
an ethernet, and tie up the network for a couple of minutes.

One solution is to make sure that each machine has a different crontab
to check the time at a different time, but that makes for a headache
to manage many machines and make sure that they all use different
times.  Another solution is to use a command like this in your
crontab:
    sleep `echo $ADDR | awk -F. '{ print $4 * 7 }' `; rdate
(anything started by our /etc/rc has $ADDR set to the machine's IP
address, but you could just as easily use "host `hostname`" to get the
IP address.

I don't want to go too much into Unix specifics on TCP-IP, but the
point is that you need to avoid certain kinds of synchronization in
networks, particularly when there are large numbers of similar
machines.
					-Mark Rosenstein