Martyn.Johnson@cl.cam.ac.uk (Martyn Johnson) (06/05/91)
Various people in the UK academic community are experimenting with NTP over our national academic network. This is part of an experimental pilot project undertaken at a small number of sites, making use of IP over X.25. We have IP connectivity with each other and with the Internet. At present, the IP network is something of a lash-up, held together with general purpose computers rather than specialised routers (these will come later). As a consequence, the network has some properties which make it rather unsuitable for time synchronisation. However, the implementation I am using (xntpd) behaves rather worse than I might expect. When the network is quiet, all is fine. However, when the network gets busy, the delays become long, variable and asymmetric, and the whole thing goes unstable, frequently resetting the local clock (sometimes by over a second). As I understand it, the data filtering algorithm keeps the 8 most recent samples, and effectively uses the "best". Since the network can get busy for hours at a time, with delays of the order of several seconds, chances are that none of these 8 samples will be any good. If you do happen to get a really good one, it will only last about ten minutes before being thrown out of the shift register. Hence the offset estimate will jump about all over the place. Now, I'm not expecting miracles. It seems to me impossible to extract any useful information from the sort of data I'm seeing. The question is: why does it try? The evidence would seem to suggest that the NTP daemon "believes" these bogus offsets, and is quite happy to use them to update the local clock. If all the offsets were small, it wouldn't matter too much, because the local clock loop would damp out the changes. But many of these offsets are large enough to make it replace the local clock value, reset everything and start again. This means that the clock jumps around all over the place, when it would actually be much better to leave it alone, doing only the skew compensation based on data collected when the network was good. I must confess to not having read and understood everything in the NTP specs yet. But I observe that there is a serious analysis of error bounds etc. Can anyone explain to me why data which is so obviously unreliable is being trusted? Is it NTP itself, or the implementation at fault? Does anyone have any general comments on this problem? I hope it is a short-term problem. The network should improve as links get faster and dedicated hardware is installed to do routeing. Also I am working on interfacing xntpd to our radio clock, which will give me a good local time reference (though I would still like to feel that the network could act as a good backup). Martyn Johnson maj@cl.cam.ac.uk University of Cambridge Computer Lab Cambridge UK
Mills@udel.edu (06/19/91)
Martyn, You can change certain parameters in the NTP daemons to widen the aperture which the daemon believes as true time. However, there are other spots where timing dispersion is excessive, like in Norway. Experience there led to certain modifications to the NTP local-clock model that should help your case as well. Unfortunately, these mods are in the NTP Version 3 specification, not in previous versions. NTP v3 has been implmented in the fuzzball serves, but is not yet available for Unix. There have been volunteers from among this mangy bunch to implment v3 for Unix, but so far none have barked. Dave