Mills@udel.edu (01/20/91)
Folks, I have recently noticed some degradation in the timekeeping quality shown by many of the fuzzball primary NTP servers, specifically at umd1.umd.edu, but to a lesser extent at all stratum-1 servers. The degradation is not debilitating, from an expected accuracy of maybe 20 ms to something like twice that, but this is still a concern for precision measurements we would like to make with DARTnet. The prolem is that some of these servers are being banged upon by an incredible wash of alligators and the poor fuzz creatures are building up significant traffic in their output queues. Some idea of the situation can be gleaned from the following update of a survey I last did some months ago. The table shows the mean flux of NTP messages received per second for each of the public NTP fuzzball servers, both at stratum-1 and stratum-2: Stratum-1 umd1.umd.edu 2.59 truechimer.cso.uiuc.edu 2.07 ncarfuzz.ucar.edu 1.95 fuzz.sdsc.edu 1.72 wwvb.isi.edu 1.44 dcn1.udel.edu 1.34 dcn5.udel.edu 1.21 Stratum-2 lilben.tn.cornell.edu 0.77 clock.sura.net 0.55 libra.rice.edu 0.31 fuzz.psc.edu 0.38 While a flux of 2.59 packets per second might not sound like much, this means there can e significant busy periods where the packets all gang up at about the same time and clog the output queuee, leading to artifically long transit times and degraded accuracy. Obviously, accuracies can be improved with better load management, specifically offloading the primary servers to the secondary ones, which continue to be underutilized, as well as balancing the loads on the primary servers. From occasional observations of the various servers I continue to see many instances where more than one campus server chimes with a single primary server, sometimes up to several do this. While a case can be made for maybe two campus servers to chime with the same primary server, in almost all cases the accuracy and robustness of campus time is enhanced to the max when the urge to pile all the campus chimers on the same set of servers is successfully resisted. It is much better to scatter the peers of up to three (not more) campus secondary servers all on different primary servers. A useful rule of thumb when designing NTP configurations is for each campus server to peer with two primary servers and with the other campus server(s) and with one secondary server from a nearby campus or one of the NSFNET secondary servers. In fact, the NTP subnet is so richly connected, especailly across the NSFNET backbone, that the NSFNET secondary servers are just about as solid in accuracy and robustness as the primary servers. Accordingly, chimers might do just as well to chime the secondary servers only. If that is done, chime only the secondary servers and not the primary ones; otherwise, the selection algorithm can be yanked by a single falseticking primary. In the absence of an available Unix version-3 NTP daemon, I am considering ways to relieve congestion at some of the primary servers. Note that version-3 has been carefully crafted so that accuracy can be maintained even when the poll intervals for synchronized paths are as long as 17 minutes, so, obviously, there is a considerable benefit to be gained by switching to that version (hint for you software weekend warriors). One of these may be limiting access to no more than two chimers from the same net. Another may be limiting availability of the UDP/TIME service to only the stratum-2 servers (abuse of the primary servers with UDP/TIME continues unabated). I have done one thing in order to improve accuracy for those customers that need it (DARTnet), while resulting in only very minor degradation for other users, by making use of the precedence queueing features of the fuzzball. Taking into account all the sanity checks, stratum assignments and crypto- checksums as configured, all those customers that potentially can synchronize the server itself are now inserted at the head of the output queue, rather than at the usual end. Therefore, if you run xntpd, enable cryptographic authentication, operate at stratum-1 or -2 and show sufficiently low delay and dispersion, then you will go to the head of the queue. Initial tests of the new "features" indicate that the DARTnet customers can enjoy sub-millisecond accuracies, while the rest of you may lose a couple of milliseconds. If enough of you can adjust your peers to equalize the loads, you will get those precious milliseconds back. Dave DS