[comp.os.research] Distributed simulation paradigms.

unni@sunset.sm.unisys.com (Unni Warrier) (07/31/87)

Two of the more famous paradigms for distributed simulation are the
Chandy-Misra scheme (CM) and the Jefferson time warp scheme (TW).  CM allows
the distributed simulation to proceed till a deadlock and then have ways of
detecting the deadlock, and resolving it.  A deadlock arises when each of the
processes in a cycle is waiting for a message from the previous process.  TW
allows the simulation to proceed and has a rollback mechanism for dealing
with out-of-order timestamped messages.  Hence one question is:  has anyone
used TW or CM in their distributed simulations? What are your experiences?

How do the two schemes compare in performance?  One way to answer the
question is to model the two methods.  The problem is that I have not come
across any models that can give a general answer to these questions.  Some
tradeoffs can be identified, in a general sense.  For example, if processing
power were free, the cost of rollback would be free, and hence TW may win.
However, if communications bandwidth were free, then extra messages would not
cost us anything, so having a large number of null messages (like in one of
the CM schemes) would be free.  Hence it is possible that CM may win.  But,
on the other hand, there is no guarantee that the number of messages in TW
are less than those in CM, specially in the face of significant rollbacks
precolating throough the network. Thus the problem with this sort of sweeping
generalization is that the specifics of a simulation may actually favour the
contra-indicated method.  Perhaps someone out there has taken a more
structured look at these problems and actually has some answers.  I would be
interested in knowing what these models are and what the answers are.

For those unfamaliar with the field, here is a reference:

Distributed Discrete-Event Simulation, J. Misra, Computing Surveys, Vol 18,
No 1, March 1986, p 39-65.

unni@cam.unisys.com

jfjr@mitre-bedford.ARPA (Jerome Freedman) (07/31/87)

In article <3557@sdcsvax.UCSD.EDU> unni@sunset.sm.unisys.com (Unni Warrier) writes:
>Two of the more famous paradigms for distributed simulation are the
>Chandy-Misra scheme (CM) and the Jefferson time warp scheme (TW).  CM allows
>the distributed simulation to proceed till a deadlock and then have ways of
>detecting the deadlock, and resolving it.  A deadlock arises when each of the
>processes in a cycle is waiting for a message from the previous process.  TW
>allows the simulation to proceed and has a rollback mechanism for dealing
>with out-of-order timestamped messages.  Hence one question is:  has anyone
>used TW or CM in their distributed simulations? What are your experiences?
>

 We are in the process of building a simulation using the Chandy-Misra
paradigm for practical reasons. The big problem(s) with simulation,
especially one big enough to require a distributed implementation 
is validation (debugging). When you build such a thing you would
surely include all types, flavors etc of tracing/dumping facilities.
"Roll backs" would make life difficult. What does it mean to trace,
slowly step through execution, when the execution you are so painfully
following my be rolled back?. I would trade bushels of NULL messages
to avoid having to debug, validate and verify such a monster.
Performance is not the only consideration.

PS I am getting mail from the modsim mailing list but I am
an inexperienced mailer and my replies to "unni" and "modsim"
keep getting bounced back. Any advice would be helpful

Jerry Freedman, Jr      "As you wander through life
jfjr@mitre-bedford.arpa   Whatever be your goal
(617)271-6248 or 7555    Keep your eye upon the doughnut
                          and not on the hole"