[comp.sys.isis] Time warp messages in ISIS V2.1

ken@gvax.cs.cornell.edu (Ken Birman) (08/14/90)

The following will probably be more common under V2.1, so I am posting
this for the newsgroup as a whole...

> From: rfinch@locke.water.ca.gov (Ralph Finch)
> Date: Mon, 13 Aug 1990 20:48:53 PDT
> Subject: Odd time warps

> While running a client and server on the same machine, I get this
> message occasionally:

> ISIS client pid 794: time warp (40.000 secs)!

> Seems odd because it's the same machine.  I *think* that every time
> the warp message was getting bigger.

A "time warp" means that the process spent 40 seconds doing some sort
of uninterruptable activity (see the ISIS manual section on when
a new task can be scheduled).  I.e. it sat and thought for 40
seconds or it waited for input from a user who sat and thought for
40 seconds, or it did file IO for 40 seconds... etc.  Specifically,
this message means that ISIS has an action to schedule at time <t> but
ended up scheduling it at time <t+w secs>.  If <w> is large enough 
ISIS prints this message.  

This can also happen when ISIS does a select with a timeout and wakes up
much later than expected.  I.e. it tells UNIX "block me, but not longer
than 3 seconds", but the select wakes up 43 seconds later. In this case,
a strong possibility is that something is leaking memory.  For example,
in V2.0 protos has a memory leak that loses 44 bytes per 50 cbcasts sent.
This makes protos gradually grow until things congest, but in the
mean time your UNIX may begin to page heavily, leading to long delays
in application software.  Perhaps your application has a much faster
leak, (i.e. forgetting to free a message you create or malloc-ed memory
you are obtaining).

In particular, if your leak causes the total virtual memory in use 
on your machine to exceed 16Mbytes, SUN OS 4.1 starts to thrash 
(there is a bug fix from SUN).  So, you suddenly see these huge delays
both in the big process and in anything else on the same machine.

To check for this, run "top" or "vmstat" or "ps l".  To check for leaks
of messages or memory managed by ISIS, use the "cmd snap" or "kill -USR2"
approach to generate dumps from possibly affected processes and look at the
first few lines, which give memory statistics.  ISIS rarely has many
messages in use and rarely uses more than 100k of memory.  If you see
300 messages in use or 875k of memory allocated, you are on the trail of
a leak...  (Probably in your code -- mine is pretty leak-proof by now, even
in V2.0).

Ken