rbthomas@frogpond.rutgers.edu (Rick Thomas) (02/14/91)
Can someone please explain what the "drift" and "compliance" numbers being put into my syslog file once per hour mean? How are they computed and what are the units for each? I am trying to run xntpd on a Si-Graphics Personal IRIS runing IRIX v3.3.1 and the clock never seems to get synced up. It does a "step" adjustment of about a half second about once every ten minutes or so. The "drift" numbers keep getting larger (in magnitude, they are negative and getting more so every day). They have reached -6.6 or so by now. What does this mean? IRIX has a feature to trim up the local clock oscillator, could I use this to make the clock keep better time? If I knew the units of the drift number, I assume I could. But what does "compliance" mean? Enjoy! Rick
louie@SAYSHELL.UMD.EDU ("Louis A. Mamakos") (02/15/91)
If you take the "drift" number, and divide it by 4096, you will get the actual drift rate. If you want this number in terms of parts per million, multiply by 1000000. louie
scotth@corp.sgi.com (Scott Henry) (02/19/91)
l> If you take the "drift" number, and divide it by 4096, you will get the l> actual drift rate. If you want this number in terms of parts per l> million, multiply by 1000000. Is 4096 the exact conversion, or is the complicated table-lookup scheme exact and 4096 an approximation? If 4096 is the exact conversion, why the lookup table scheme in the first place? The lookup scheme is not especially linear for large values of drift, and I suspect that it may be the cause of machines with large drift values being unable to sync. -- Scott Henry <scotth@sgi.com> / Traveller on Dragon Wings Information Services, / Help! My disclaimer is missing! Silicon Graphics, Inc / Politicians no baka!
scotth@corp.sgi.com (Scott Henry) (02/20/91)
>>Is 4096 the exact conversion, or is the complicated table-lookup scheme >>exact and 4096 an approximation? l> Yes, 4096 is the exact "conversion" to normal, everyday units. There l> is no "complicated table-lookup scheme". >>If 4096 is the exact conversion, why the >>lookup table scheme in the first place? The lookup scheme is not >>especially linear for large values of drift, and I suspect that it may be >>the cause of machines with large drift values being unable to sync. l> I don't know what you're refering to when you cite a "lookup table l> scheme". There is no lookup table that has anything to do with the l> drift value. The drift value is used in the local clock algorithm to l> represent the intrinisic drift of your host's clock, and to apply a l> correction to it to keep its effective frequency correct. NTP (and l> ntpd) allows you to correct the phase of the clock as well as the l> frequency. Sorry, I thought this was reference to xntpd, where there IS something that I would call a "lookup table scheme" for conversion between drift and timestamp values. I am specifically referring to the macros TVUTOTSF and TSFTOTVU in include/ntp_unixclock.h which use lookups into the tables defined in lib/tvtots.c, lib/tstotv.c, etc. My emperical experimentation, the tables come out with a value averaging near 4130. If I'm looking in the wrong part of the code, I'd welcome pointers. l> Machines with large drift values either have broken hardware (i.e. l> crummy crystals) or crummy software (i.e. missing clock interrupts). l> The large drift value is just a symptom of the problem. I'm stuck with having to allow for crummy crystals. Missing clock interrupts doesn't seem to be a problem with a good crystal, so it should be allowable with a crummy one... -- Scott Henry <scotth@sgi.com> / Traveller on Dragon Wings Information Services, / Help! My disclaimer is missing! Silicon Graphics, Inc / Politicians no baka!
louie@sayshell.umd.edu (Louis A. Mamakos) (02/20/91)
>Is 4096 the exact conversion, or is the complicated table-lookup scheme >exact and 4096 an approximation? Yes, 4096 is the exact "conversion" to normal, everyday units. There is no "complicated table-lookup scheme". >If 4096 is the exact conversion, why the >lookup table scheme in the first place? The lookup scheme is not >especially linear for large values of drift, and I suspect that it may be >the cause of machines with large drift values being unable to sync. I don't know what you're refering to when you cite a "lookup table scheme". There is no lookup table that has anything to do with the drift value. The drift value is used in the local clock algorithm to represent the intrinisic drift of your host's clock, and to apply a correction to it to keep its effective frequency correct. NTP (and ntpd) allows you to correct the phase of the clock as well as the frequency. Machines with large drift values either have broken hardware (i.e. crummy crystals) or crummy software (i.e. missing clock interrupts). The large drift value is just a symptom of the problem. louie
dennis@UTCS.UTORONTO.CA (Dennis Ferguson) (02/21/91)
Scott, >>>If 4096 is the exact conversion, why the >>>lookup table scheme in the first place? The lookup scheme is not >>>especially linear for large values of drift, and I suspect that it may be >>>the cause of machines with large drift values being unable to sync. [...] >Sorry, I thought this was reference to xntpd, where there IS something >that I would call a "lookup table scheme" for conversion between drift and >timestamp values. I am specifically referring to the macros TVUTOTSF and >TSFTOTVU in include/ntp_unixclock.h which use lookups into the tables >defined in lib/tvtots.c, lib/tstotv.c, etc. My emperical experimentation, >the tables come out with a value averaging near 4130. If I'm looking in >the wrong part of the code, I'd welcome pointers. I think I know where this confusion comes from. Xntpd does internal time-dimensioned computations using a 64-bit fixed point format, with the decimal point between the two 32-bit halves. You can think of this as working with 64-bit integer values having units of 2**(-32) seconds. The drift value, which is actually the error component computed by a phase-locked loop, has dimensions of time and is computed in this format. Every 4 seconds the drift value is divided by 1024 and the result is added to or subtracted from the system clock using adjtime(). (4*1024) is where the 4096 comes from. 1024 is a scale factor which falls out of the PLL calculations. I think the calculation is scaled like this so that the results don't disappear to the right of the least significant digit of the fuzzball's internal time representation (fuzzballs use integer milliseconds to represent time values). So you have a time that you want to add to the system clock using adjtime(). The time (say 0.000040 seconds, or something) is computed in units of 2**(-32) seconds. Adjtime(), however, wants to be told the time in units of microseconds. This means you have to multiply the internal value by (10**6)/(2**32). This is what the TSFTOTVU macro is used for. The tables are essentially lists of precomputed results of this multiplication, so you can compute the microsecond value using three table lookups and a few adds rather than actually doing the multiply (the latter is hard to do without a 32x32=64 bit multiply instruction). The TVUTOTSF macro does the inverse conversion. You can check this yourself. For example, a struct timeval value of 500000 usec should produce an internal format value of 0x80000000 when run through TVUTOTSF, and 250000 usec should produce 0x40000000. It would be very interesting to know if these tables weren't linear. As for machines with large drift values being unable to sync (or at least, requiring you to prime the drift file with a number which is in the ball park before they'll sync), this is limited by the capture aperture of the PLL used (essentially, by the code which decides when it is appropriate to slew the clock and when it is appropriate to step it). This is intentional. There is a tradeoff between the size of the frequency error you can compensate for, the speed and stability with which you can correct phase errors, and the quality of synchronization once everything settles down. A wider capture aperture would cost you either by causing clocks whose crystals were accurate but whose time was off to take a longer time to synchronize, or by causing larger overshoots and oscillations when a clock was being brought into line. As it is hard to buy crystals which are worse than about 20 ppm off (and you'd have to look hard to find crystals this bad), NTP's capture aperture of about 150 ppm was probably considered plenty. Dennis Ferguson