[comp.sys.isis] Time warp messages, explained

ken@gvax.cs.cornell.edu (Ken Birman) (10/03/89)
Ellen Vorhees noted that ISIS logs on her system sometimes
contain large numbers of messages like these:
	... Time is now Mon Oct  2 10:22:59 1989
	Time warp!  Adjusting timers by 1.943 secs
	Time warp!  Adjusting timers by 2.671 secs
	Time warp!  Adjusting timers by 5.622 secs
	... Time is now Mon Oct  2 10:25:05 1989
	Time warp!  Adjusting timers by 2.507 secs
	... Time is now Mon Oct  2 10:26:58 1989
	Time warp!  Adjusting timers by 1.366 secs
She asks if this is normal for ISIS, and what it means.

Well, it isn't normal.

The message arises as follows.  The ISIS protocols process,
bin/protos, spends its time in a main loop where it does a
select() waiting for messages to arrive, with a timeout that
wakes it up when it should next take a timer based action.

If it wakes up AFTER the timeout should have expired, protos complains
about a "time warp" and prints the delay.  Thus, a warp of 2.302
seconds would mean that protos expected a timeout to interrupt
its select 2.302 seconds ago, and was surprised that it didn't.
Protos always adjusts its timers when this happens so that they don't
all timeout a once, before it has time to read any messages that might
now be on the input queues.  For delays below about 1.25 seconds
nothing is printed at all.
 
Normally, you get one message like this a day, around 2am when
UNIX runs its clock demon.  This demon resets the system clock, and ISIS
notices the jump next time it runs.  Aside from printing a message,
there should be no visible effect at all.

A log with many "Time warp" messages means something odd is going on.
This would normally be a sign that some process has gotten very large
and is paging heavily (thrashing).  On some versions of UNIX, it could
also mean that there is a UNIX bug in the select(2) system
call, such that the timeout doesn't always work.  Or, it might be due
to someone using "nice" to run a compute-bound process at very high priority,
starving protos for CPU cycles.

To look for an explanation, you might start by running vmstat and
watching for a pattern -- like very high paging rates.  With 'ps' you
can see how large processes have gotten, and with the ISIS snapshot
commands you can even get some memory statistics.  

Ken