[comp.protocols.time.ntp] What means "drift" and "compliance" ?

rbthomas@frogpond.rutgers.edu (Rick Thomas) (02/14/91)

Can someone please explain what the "drift" and "compliance" numbers
being put into my syslog file once per hour mean?

How are they computed and what are the units for each?

I am trying to run xntpd on a Si-Graphics Personal IRIS runing IRIX v3.3.1 
and the clock never seems to get synced up.  It does a "step"
adjustment of about a half second about once every ten minutes or so.
The "drift" numbers keep getting larger (in magnitude, they are
negative and getting more so every day).  They have reached -6.6 or so
by now.  What does this mean?

IRIX has a feature to trim up the local clock oscillator, could I use
this to make the clock keep better time?  If I knew the units of the
drift number, I assume I could.  But what does "compliance" mean?


Enjoy!

Rick

louie@SAYSHELL.UMD.EDU ("Louis A. Mamakos") (02/15/91)

If you take the "drift" number, and divide it by 4096, you will get the actual
drift rate.  If you want this number in terms of parts per million, multiply
by 1000000.

louie

scotth@corp.sgi.com (Scott Henry) (02/19/91)

l> If you take the "drift" number, and divide it by 4096, you will get the
l> actual drift rate.  If you want this number in terms of parts per
l> million, multiply by 1000000.

Is 4096 the exact conversion, or is the complicated table-lookup scheme
exact and 4096 an approximation? If 4096 is the exact conversion, why the
lookup table scheme in the first place? The lookup scheme is not
especially linear for large values of drift, and I suspect that it may be
the cause of machines with large drift values being unable to sync.

--
 Scott Henry <scotth@sgi.com> / Traveller on Dragon Wings
 Information Services,       / Help! My disclaimer is missing!
 Silicon Graphics, Inc      / Politicians no baka!

scotth@corp.sgi.com (Scott Henry) (02/20/91)

>>Is 4096 the exact conversion, or is the complicated table-lookup scheme
>>exact and 4096 an approximation? 

l> Yes, 4096 is the exact "conversion" to normal, everyday units.  There
l> is no "complicated table-lookup scheme".

>>If 4096 is the exact conversion, why the
>>lookup table scheme in the first place? The lookup scheme is not
>>especially linear for large values of drift, and I suspect that it may be
>>the cause of machines with large drift values being unable to sync.

l> I don't know what you're refering to when you cite a "lookup table
l> scheme".  There is no lookup table that has anything to do with the
l> drift value.  The drift value is used in the local clock algorithm to
l> represent the intrinisic drift of your host's clock, and to apply a
l> correction to it to keep its effective frequency correct.  NTP (and
l> ntpd) allows you to correct the phase of the clock as well as the
l> frequency.

Sorry, I thought this was reference to xntpd, where there IS something
that I would call a "lookup table scheme" for conversion between drift and
timestamp values. I am specifically referring to the macros TVUTOTSF and
TSFTOTVU in include/ntp_unixclock.h which use lookups into the tables
defined in lib/tvtots.c, lib/tstotv.c, etc. My emperical experimentation,
the tables come out with a value averaging near 4130. If I'm looking in
the wrong part of the code, I'd welcome pointers.

l> Machines with large drift values either have broken hardware (i.e.
l> crummy crystals) or crummy software (i.e. missing clock interrupts).
l> The large drift value is just a symptom of the problem.

I'm stuck with having to allow for crummy crystals. Missing clock
interrupts doesn't seem to be a problem with a good crystal, so it should
be allowable with a crummy one...

--
 Scott Henry <scotth@sgi.com> / Traveller on Dragon Wings
 Information Services,       / Help! My disclaimer is missing!
 Silicon Graphics, Inc      / Politicians no baka!

louie@sayshell.umd.edu (Louis A. Mamakos) (02/20/91)

>Is 4096 the exact conversion, or is the complicated table-lookup scheme
>exact and 4096 an approximation? 

Yes, 4096 is the exact "conversion" to normal, everyday units.  There
is no "complicated table-lookup scheme".

>If 4096 is the exact conversion, why the
>lookup table scheme in the first place? The lookup scheme is not
>especially linear for large values of drift, and I suspect that it may be
>the cause of machines with large drift values being unable to sync.

I don't know what you're refering to when you cite a "lookup table
scheme".  There is no lookup table that has anything to do with the
drift value.  The drift value is used in the local clock algorithm to
represent the intrinisic drift of your host's clock, and to apply a
correction to it to keep its effective frequency correct.  NTP (and
ntpd) allows you to correct the phase of the clock as well as the
frequency.

Machines with large drift values either have broken hardware (i.e.
crummy crystals) or crummy software (i.e. missing clock interrupts).
The large drift value is just a symptom of the problem.

louie

dennis@UTCS.UTORONTO.CA (Dennis Ferguson) (02/21/91)

Scott,

>>>If 4096 is the exact conversion, why the
>>>lookup table scheme in the first place? The lookup scheme is not
>>>especially linear for large values of drift, and I suspect that it may be
>>>the cause of machines with large drift values being unable to sync.
[...]
>Sorry, I thought this was reference to xntpd, where there IS something
>that I would call a "lookup table scheme" for conversion between drift and
>timestamp values. I am specifically referring to the macros TVUTOTSF and
>TSFTOTVU in include/ntp_unixclock.h which use lookups into the tables
>defined in lib/tvtots.c, lib/tstotv.c, etc. My emperical experimentation,
>the tables come out with a value averaging near 4130. If I'm looking in
>the wrong part of the code, I'd welcome pointers.

I think I know where this confusion comes from.  Xntpd does internal
time-dimensioned computations using a 64-bit fixed point format, with
the decimal point between the two 32-bit halves.  You can think of
this as working with 64-bit integer values having units of 2**(-32)
seconds.  The drift value, which is actually the error component computed
by a phase-locked loop, has dimensions of time and is computed in this
format.

Every 4 seconds the drift value is divided by 1024 and the result is
added to or subtracted from the system clock using adjtime().  (4*1024)
is where the 4096 comes from.  1024 is a scale factor which falls out
of the PLL calculations.  I think the calculation is scaled like this
so that the results don't disappear to the right of the least significant
digit of the fuzzball's internal time representation (fuzzballs use
integer milliseconds to represent time values).

So you have a time that you want to add to the system clock using
adjtime().  The time (say 0.000040 seconds, or something) is computed
in units of 2**(-32) seconds.  Adjtime(), however, wants to be told
the time in units of microseconds.  This means you have to multiply
the internal value by (10**6)/(2**32).  This is what the TSFTOTVU
macro is used for.  The tables are essentially lists of precomputed
results of this multiplication, so you can compute the microsecond
value using three table lookups and a few adds rather than actually
doing the multiply (the latter is hard to do without a 32x32=64 bit
multiply instruction).  The TVUTOTSF macro does the inverse conversion.
You can check this yourself.  For example, a struct timeval value
of 500000 usec should produce an internal format value of 0x80000000
when run through TVUTOTSF, and 250000 usec should produce 0x40000000.

It would be very interesting to know if these tables weren't linear.

As for machines with large drift values being unable to sync (or at
least, requiring you to prime the drift file with a number which is
in the ball park before they'll sync), this is limited by the capture
aperture of the PLL used (essentially, by the code which decides
when it is appropriate to slew the clock and when it is appropriate
to step it).  This is intentional.  There is a tradeoff between
the size of the frequency error you can compensate for, the speed
and stability with which you can correct phase errors, and the
quality of synchronization once everything settles down.  A wider
capture aperture would cost you either by causing clocks whose
crystals were accurate but whose time was off to take a longer time
to synchronize, or by causing larger overshoots and oscillations
when a clock was being brought into line.  As it is hard to buy
crystals which are worse than about 20 ppm off (and you'd have
to look hard to find crystals this bad), NTP's capture aperture
of about 150 ppm was probably considered plenty.

Dennis Ferguson