[comp.protocols.time.ntp] ntpd says lost NTP peer right after synching...

andy@jhunix.HCF.JHU.EDU (Andy S Poling) (04/12/91)

I'm running the ntpd code from louie.udel.edu on two Ultrix boxes and a SysV
port of the same code on a box running SysV rel 3.1.5 and I'm seeing the
same thing happen on all three...

Whenever they choose a server to which to synchronize, they then seem to
lose all of the data about that server causing an "NTP peer lost" message
and a constant swapping of the available, sane, servers.

This just doesn't seem quite right to me since it causes ntpd to query it's
favorite two or three servers every 64 seconds most of the time, rather than
sliding to a 1024 second interval (which seems to me like the thing to do).
I admit that I haven't studied the NTP spec in detail...

Am I missing something?  Being stupid?  Is there any reason why I shouldn't
modify the code to prevent this behavior?

Thanx,
-Andy
--
Andy Poling                              Internet: andy@gollum.hcf.jhu.edu
UNIX Systems Programmer                  Bitnet: ANDY@JHUNIX
Homewood Academic Computing              Voice: (301)338-8096    
Johns Hopkins University                 UUCP: uunet!mimsy!aplcen!jhunix!andy

louie@SAYSHELL.UMD.EDU ("Louis A. Mamakos") (04/13/91)

It sounds like what is happening is that when ntpd initially selects a host,
it had to reset the local clock rather then slew it because it was too
far off.  Whenever the local clock is reset, all of the offset/delay samples
in the filters are flushed, and we start all over again.  It then reselects
a peer, but the selected clock is too far off again.  Could it be that
your network paths are exceptionally "noisy" or that you computers are 
keeping really crummy time?

louie

andy@jhunix.HCF.JHU.EDU (Andy S Poling) (04/17/91)

In article <9104122212.AA02914@sayshell.umd.edu> louie@SAYSHELL.UMD.EDU ("Louis A. Mamakos") writes:
>It sounds like what is happening is that when ntpd initially selects a host,
>it had to reset the local clock rather then slew it because it was too
>far off.  Whenever the local clock is reset, all of the offset/delay samples
>in the filters are flushed, and we start all over again.  It then reselects
>a peer, but the selected clock is too far off again.  Could it be that
>your network paths are exceptionally "noisy" or that you computers are 
>keeping really crummy time?

That is exactly what seems to be happening (at least the stepping part).
Ntpd selects a host, steps the clock, and clears the filter.  Then, ten
minutes later, the same thing happens again because it has selected a
different server which is too far from the first server.  Here is an example
of the ntpdc results on one of the servers which illustrates the problem
nicely:

    Address      Reference     Strat Poll Reach    Delay   Offset    Disp
==========================================================================
+128.220.1.2     130.43.2.2        2  512  377      25.0    -75.0     33.0
*18.72.0.3       WWV               1   64  337     103.0     35.0     16.0
.132.249.16.1    WWVB              1  256  375     175.0    -10.0     11.0
+128.102.16.10   WWV               1  512  377     152.0     14.0      5.0
+130.126.174.40  WWVB              1  256  377     204.0      6.0     22.0

The offset figures stay pretty consistent, but the behavior (server hopping)
persists.  Are these primary servers really keeping disparate time or is
something else wrong?

I don't think these machines have terrible clocks, and we have generally
excellent network connectivity.

-Andy

--
Andy Poling                              Internet: andy@gollum.hcf.jhu.edu
UNIX Systems Programmer                  Bitnet: ANDY@JHUNIX
Homewood Academic Computing              Voice: (301)338-8096    
Johns Hopkins University                 UUCP: uunet!mimsy!aplcen!jhunix!andy