[comp.dcom.lans] Time synchronization in a Distributed Environment

aws@druhi.ATT.COM (SteereA) (02/24/88)

Hi,
  I am looking for articles, references, implementations,
etc. for solving the problem of keeping N machines within
a specified time of one another.  I appreciate any and all
pointers.

	Thanks,
	Andy Steere
	  (303) 538-4128
 	  ihnp4!druhi!aws

mohamed@hscfvax.harvard.edu (Mohamed_el_Lozy) (02/25/88)

In article <2710@druhi.ATT.COM> aws@druhi.ATT.COM (SteereA) writes:
>  I am looking for articles, references, implementations,
>etc. for solving the problem of keeping N machines within
>a specified time of one another.  I appreciate any and all pointers.

I am directing followup to comp.protocols.tcp-ip, where it would be
more appropriate.  I assume that the machines are networked.

A good start would be in three RFCs written by Dave Mills:
	956	Algorithms for synchronizing network clocks
	957	Experiments in network clock synchronization
	958	Network time protocol (NTP)
They all came out in 1985, and have reasonable references for that time.

A UNIX (BSD only, as I recall) implementation of NTP has been written
at trantor.umd.edu, which also maintains a mailing list.  To get on it
send mail to ntp-request@trantor.umd.edu.

There is also often quite a bit of network time discussion in
comp.protocols.tcp-ip, especially when interesting things like leap
seconds turn up.

BSD4.3 implementations have a timed program, discussed at some length
in the documentation (not available to me right now).

I would very much appreciate some post 1985 references.

pase@ogcvax.UUCP (Douglas M. Pase) (02/28/88)

In article <druhi.2710> aws@druhi.ATT.COM (SteereA) writes:
>Hi,
>  I am looking for articles, references, implementations,
>etc. for solving the problem of keeping N machines within
>a specified time of one another.  I appreciate any and all
>pointers.
>

This article may be of some use to you

%A Leslie Lamport
%T Time, Clocks, and the Ordering of Events in a Distributed System
%J Communications of the ACM
%V 21
%N 7
%P 558-565
%D July 1978
%K lam78

--
Doug Pase  --  ...ucbvax!tektronix!ogcvax!pase  or  pase@cse.ogc.edu (CSNet)

firth@sei.cmu.edu (Robert Firth) (02/29/88)

In article <druhi.2710> aws@druhi.ATT.COM (SteereA) writes:

]  I am looking for articles, references, implementations,
]etc. for solving the problem of keeping N machines within
]a specified time of one another.  I appreciate any and all
]pointers.

In article <1571@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:

]This article may be of some use to you
]
]%A Leslie Lamport
]%T Time, Clocks, and the Ordering of Events in a Distributed System
]%J Communications of the ACM
]%V 21
]%N 7
]%P 558-565
]%D July 1978
]%K lam78

I'd second that.  This is an excellent article on the subject.
However, whereas the first half is generally useful, the second
part - where Lamport talks about real clocks rather than 'logical'
clocks - applies only in the special circumstance that all the
processors are in the same inertial frame of reference. (As the
author indicates in a footnote)

In general, you CANNOT keep two clocks synchronized within an
arbitrary delta, for reasons explained by Einstein these many
years ago.

delp@udel.EDU (Gary Delp) (03/01/88)

The Lamport Article is good.  Dave Mills has been doing it for the
Internet for a good while.  You might look at:

RFC-956/-7/-8 and references therein. See also latest issue CCR and
several byzantia scattered over JACM and dissertations.  

To millisecond grandularity over a widely spread network with
considerable jitter, the problem is a solved problem.
-- 
Gary Delp
123 Evans Hall, EE, U of D, Newark, DE 19716;  (302)-451-6653 or 2405
UUCP:  ...!{{allegra,ihnp4}!berkeley,harvard}!delp@udel.edu
CSNET:  delp%udel.edu@relay.cs.net      ARPA: delp@udel.edu

devine@cookie.dec.com (Bob Devine) (03/01/88)

  Here are two more papers on this topic:

T.K.Srikanth and Sam Toueg; Optimal Clock Synchronization
Journal of the ACM  July 1987

Neil Rickert; Non-Byzantine Clock Synchronization - A
Programming Experiment"  ACM Operating Systems Review Jan 1988

Bob

rajaei@ttds.UUCP (Hassan Rajaei) (03/05/88)

In article <1571@ogcvax.UUCP> pase@ogcvax.UUCP (Douglas M. Pase) writes:
>In article <druhi.2710> aws@druhi.ATT.COM (SteereA) writes:
>>Hi,
>>  I am looking for articles, references, implementations,
>>etc. for solving the problem of keeping N machines within
>>a specified time of one another.  I appreciate any and all
>>pointers.
>>
>
>This article may be of some use to you
>
>%A Leslie Lamport
>%T Time, Clocks, and the Ordering of Events in a Distributed System
>%J Communications of the ACM
>%V 21
>%N 7
>%P 558-565
>%D July 1978
>%K lam78

The Virtual Time theory and its implementaion Time Warp mechanism may
help as well. There is a good article on the subject:

Virtual Time, David Jefferson, ACM Trans. on Prog. Lang. & Syst.,
Vol. 7, No3, July 1985 pp 404-425

There is a Time Warp Operating System (TWOS) on Caltech Mark III Hypercube
which implements the TW mechanism. You may find the article in :
ACM Operating System Review, vol. 21, No. 5, pp 77-93, "Distributed Simulation
and the Time Warp Operating System", D. Jefferson et al.


Hassan Rajaei
rajaei@ttds.tds.kth.se

tkevans@fallst.UUCP (Tim Evans) (03/05/88)

In article <1571@ogcvax.UUCP>, pase@ogcvax.UUCP (Douglas M. Pase) writes:
> In article <druhi.2710> aws@druhi.ATT.COM (SteereA) writes:
> Hi,
>   I am looking for articles, references, implementations,
> etc. for solving the problem of keeping N machines within
> a specified time of one another.  I appreciate any and all
> pointers.
> 
> 
In an environment where Fusion (tm) network software runs, you can hack a
way of coordinating date/time among a group of machines.  (This works in
Sys V, at least.)

Select one machine as a master, and manually keep its date/time correct.
Set up a cron on the rest (which must be run with root authority) that
executes the following command using Fusion's (tm) 'rx':

	date `rx machine_name date '+%m%d%H%M%y'`

While I don't know, other network software presumably has the ability
to remotely execute a command on another machine in the network.

edw@IUS1.CS.CMU.EDU (Eddie Wyatt) (03/09/88)

> >   I am looking for articles, references, implementations,
> > etc. for solving the problem of keeping N machines within
> > a specified time of one another.  I appreciate any and all
> > pointers.


  Here's a procedure that I designed and coded.  If anyone has
any question, complaints or criticism  mail me.



/**************************************************************************
 *                                                                        *
 *                            sync_clocks                                 *
 *                                                                        *
 **************************************************************************

   Purpose :  This function helps syncronize the times between the module
            and lmb.  Lmb time is consider the correct time.  It also
	    determines if the time unit is in seconds or in milliseconds.
	    The method used is is follows :
 
 
                t1 = clock value on lmb side at message sending time
                t2 = clock value on module side at message receiving time
                m = time disparity between lmb clock and module clock
                    (this is the value of interest) 
                N = distribute representing the network transmittion time 
                M = distribution representing the time to send a message to 
                    a machine and getting a response (round trip)

		Assumption - processing time is negligible.
 
 
                t1 = t2 + m + N
 
                E[M] = E[N + N] = E[N] + E[N] = 2*E[N]
 
                E[m + N] = E[m] + E[N] = m + E[M]/2
 
                m = E[m + N] - E[M]/2 
            
 


   Programmer :Eddie Wyatt
 
   Date : December 1986 (Feb 1987)

   Input : None

   Output : None

   Locals : 
     i - loop increment
     send_time - the module time that a message is sent
     receive_time - the module time that a message is receive
                    (receive_time - send_time ~= M)
     lmb_time - the time on time lmb side
                    (lmb_time ~= m + N)

   Globals : 
     time_units - time is in seconds or milliseconds.
     time_disp - is modified to be equal to the time disparity between
                 the lmb clock and the module clock
     port - not modified

 ************************************************************************/

LIB_EXPORT void sync_clocks(port)
    {
    register int  i, send_time, receive_time, lmb_time;

    time_disp = 0;
    time_units = (TIME_UNITS) Nreceiveint(port);


    for (i = 0; i < NUMOFCLOCKSAMPLES; i++)
        {
        send_time = (time_units == INSECONDS) ? (int) time(NULLPTR(long))
					      : get_time_in_msec();
        Nsendint(1,port);
        lmb_time = Nreceiveint(port);
        receive_time = (time_units == INSECONDS) ? (int) time(NULLPTR(long))
				    		 : get_time_in_msec();
        time_disp += 2*lmb_time - receive_time - send_time;
        }

    time_disp /=(2*NUMOFCLOCKSAMPLES);
    }
-- 

Eddie Wyatt 				e-mail: edw@ius1.cs.cmu.edu

jerry@oliveb.olivetti.com (Jerry Aguirre) (03/10/88)

Being able to synchronize all systems to the same time is nice.  Having
that time be the correct time is even nicer.  I have several Vax750s and
a Vax785 who's clocks run fast.  By selecting systems with better clocks
to be the masters I can work around this but the result is not ideal.

Does anybody know where to adjust the real-time clock on a Vax?  I could
probably come up with an accurate frequency counter or just keep
tweeking it until it is right but I can't find the crystal, much less a
trimmer for it.

					Jerry Aguirre @ Olivetti ATC
					uunet!amdahl!oliveb!jerry

nather@ut-sally.UUCP (Ed Nather) (03/11/88)

> > In article <druhi.2710> aws@druhi.ATT.COM (SteereA) writes:
> > Hi,
> >   I am looking for articles, references, implementations,
> > etc. for solving the problem of keeping N machines within
> > a specified time of one another.  I appreciate any and all
> > pointers.
> > 

I recently faced the problem of keeping the internal CPU clock in an IBM PC
in step with a more accurate clock located in an interface card.  The interface
sends a data burst once per second though the serial port.  I used the arrival
of the first byte as a time tick.

The IBM PC keeps internal time by counting down its instruction clock frequency
and it is possible to modify the value of the countdown if the timer interrupt
is intercepted.  No integer value of the countdown will give precisely 1 ms
time ticks, which was needed for this application, so I alternate between one
slightly too large, and one slightly too small.  The amount of time each
countdown value remains active is adjusted by watching for drift between the
two clocks; a pair of countdowns is averaged about every 90 sec, and any drift
is noted (and the CPU clock is forced back into phase at the same time).  After
about 10 cycles, the accumulated drift is used to adjust the amount of time
each countdown value is used for the next 10 cycles.

This constitutes a software servo that works quite well with all the PCs tested.
Their instruction clock frequencies are quite different from one to the next,
but the servo locks in after the first frequency change and stays locked
thereafter.

Details at eleven ...


-- 
Ed Nather
Astronomy Dept, U of Texas @ Austin
{allegra,ihnp4}!{noao,ut-sally}!utastro!nather
nather@astro.AS.UTEXAS.EDU

hirshman@60600.dec.com (Bret H. {FS Tech Support@Sydney, Oz} SNE/G 4125546) (03/18/88)

> Being able to synchronize all systems to the same time is nice.  Having
> that time be the correct time is even nicer.  I have several Vax750s and
> a Vax785 who's clocks run fast.  By selecting systems with better clocks
> to be the masters I can work around this but the result is not ideal.
> 
> Does anybody know where to adjust the real-time clock on a Vax?  I could
> probably come up with an accurate frequency counter or just keep
> tweeking it until it is right but I can't find the crystal, much less a
> trimmer for it.
> 
>                                         Jerry Aguirre @ Olivetti ATC
>                                         uunet!amdahl!oliveb!jerry

I'm afraid there are no clock crystal frequency trimmers on any of the VAXes
that I've come across. Even if there were, you would (a) almost certainly
invalidate a DEC Maintenance Agreement by twiddling them yourself, (b) need
to keep careful track of any maintenance done and adjust it all again if the
relevant module was replaced, and (c) you'd still have thermal drift and 
crystal ageing to worry about. But don't despair! I can think of a number of
ways to do what you want, most of which don't even need a frequency counter.
But first a little background info: 

I'll discuss VMS here because that's what I know, but I'm sure the basic ideas
are quite applicable to Unix. VMS maintains the current system time in a 
software counter in memory. This is incremented at hardware clock interrupt
time, which is always every 10 milliseconds for VMS on all current VAXes.

There are actually two hardware real-time clocks present in most VAXen, with
different purposes and specifications. The first and most relevant one is the
Interval Counter, a programmable 32 bit counter which is incremented at one
microsecond intervals with a nominal .01% clock accuracy, i.e. +/- 8.64 seconds
per day. This counter is present in all VAXes other than microVAXes, which have
fixed unprogrammable 10 millisecond clock interrupts. At boot time and after
power fail restarts, VMS programs the Next Interval Count Register (internal
processor register #25) with a value of -10,000 to produce clock interrupts at
10 millisecond intervals. These are the only times VMS touches the NICR, a
handy fact which gives us our best method for tweaking the time. Again, I'd be
surprised if the various Unixes did this much differently. The other clock,
which is architecturally optional for MicroVAX implementations, is the
battery-backed Time Of Day/Time Of Year (TOD or TOY) clock. On the 725/30, 750,
780/2/5, 82/8300 series, 8600/50 and the 85/87/8800 series this is a 32 bit
unsigned binary counter (the TODR, internal processor register #27) with 10
millisecond resolution and a clock accuracy of at least .0025%, i.e. +/- around
65 seconds per month. The old microVAX/VAXstation I has no TOD/TOY clock at
all. On the microVAX/VAXstation II & 2000, and I *think* on the microVAX
III/3000 series, there is a battery backed MC146818 CMOS watch chip with 1
second resolution which is accessed as a series of 8 bit registers in I/O
address space. I haven't been able to find any accuracy specs for the 32.768
kHz crystal oscillator which drives this. The watch chip is tricky to access,
so see the KA630 CPU Module Users Guide (DEC order no. EK-KA630-UG) for more
info. Actually, the 82/8300 series also have one of these watch chips because
their TODR register is volatile and the software must reload it from the watch
chip after a power down. See the KA820 CPU Technical Manual (EK-KA820-TM) if
you want to access the watch chip on these VAXes. All the previously mentioned
registers are only accessible with the CPU in kernel mode.

The only time the TOD/TOY clock is read is at boot time, after power fail
restarts, or when a SET TIME command/$SETIME system service call with no
parameters is executed. Note that there is *no* VMS system service call
available that will simply read the TOD/TOY register without simultaneously
updating the current system time in memory. 

So,

1) It follows that the quickest and nastiest method of improving time accuracy
is to periodically update the current time using the TOD/TOY clock or some
other time reference. For VMS this can be as simple as submitting a two line
batch command file that does a SET TIME then resubmits itself for X minutes
later. 
PRO:- Really simple. Works on VAXes without programmable interval counters.
CON:- Really Ugly. This has the big disadvantage that the time corrects in a
 discontinuous fashion and may well double back on itself. Many applications
 wouldn't like that at all. Must have a TOD/TOY register or other machine
 readable time source.

2) Determine the percentage error of the system time by comparing it against
a known time standard over a period of a few days. Do a once-only change to the
NICR value of a compensating amount by running a suitable program as soon as
possible after system initialisation. For a nominal NICR value of -10,000 this
should theoretically allow a time precision of one part in 10,000 or +/- 8.6
seconds a day. By dithering the NICR value (changing it up and down by one 
count at precalculated intervals) you could get greater precision.
PRO:- No machine readable external or internal time reference required, easy to
 implement, accuracy good for most purposes. Works about as well as does
 adjusting crystal frequency.
CON:- Doesn't compensate for thermal drift and ageing. Does a mediocre job of
 synchronising clocks on multiple VAXes. Sensitive to modules being replaced.
 Difficult to get really high accuracy. Can't be done on VAXes with no
 programmable interval counter. 

3) Use a modification of method (2) aARPA INTERNET: hirshman@ripper.DEC.COM
                 Anon.    #  Snail: Digital Equipment Corp. P/L, 18 Glen Street,
                          #         Eastwood, NSW 2122, AUSTRALIA

DISCLAIMER: The above opinions are mine (and probably mine alone, *sigh!*).
----------  

hirshman@gidday.dec.com (Bret H., Tech Support @ Sydney, Oz SNE-G 4125546) (03/26/88)

A while ago I posted a note making some suggestions about possible methods
for correcting and synchronising the system time on VAXes.  A couple of the
methods involved using the VAX TODR (Time Of Day Register) as a reference.
This is all well and good for *most* VAXes, though I didn't really give enough
information on the VAX/VMS TODR format for any one to use it successfully
without more research.

BUT (and this is a big but, sportsfans!) if you value your system uptime *DON'T*
read the TODR on any of the 85xx/87xx/88xx/89xx series VAXes (the Nautilus
family), and especially don't write to it!  You might corrupt your system time
at best, and hang your VAX at worst.

The reasons for this are too complex to explain here in what is meant to be a
quick warning note.  Suffice it to say that the implementation of the TODR on 
Nautilus-family VAXes is a lot more complex than I led you to believe in my
original posting.  In other words, I blew it.  Sorry about that, folks!

Also, if anybody posted any queries or comments on my original note I'm afraid
I didn't see them.  In the best traditions of Murphy's Law, my news feed went
down for more than a week within hours of my first posting.  Typical! :-)
So please send me mail as well as posting, or just send me mail.  It's a lot
more reliable for me.


~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
                          #      Bret A. Hirshman, Esq.
 "The makers may make     #
 and the users may use,   #  DEC EasyNet: RIPPER::HIRSHMAN
 but the fixers must fix  #  USENET: hirshman@ripper.DEC.COM
 with but minimal clues"  #    or ..!{decwrl,decuac}!ripper.dec.com!hirshman
                          #  ARPA INTERNET: hirshman@ripper.DEC.COM
                 Anon.    #  Snail: Digital Equipment Corp. P/L, 18 Glen Street,
                          #         Eastwood, NSW 2122, AUSTRALIA

DISCLAIMER: The above opinions are mine (and probably mine alone, *sigh!*).
----------