[comp.parallel] Optimistic vs convervative

jsimellon@uunet.UU.NET (Larry Mellon) (05/29/89)

[optimistic vs conservative synchronization lead-in deleted]

In his article <5600@hubcap.clemson.edu> Joe Beckenbach writes:
> [...]  Better yet, answer my objections (or what you perceive as my 
> objections).
     You've brought up some good points Joe, but your lack of knowledge
 about Timewarp has given you many misconceptions as to what it doing
 and how.

>In his article <5573@hubcap.clemson.edu> Larry Mellon writes:
>>    Jefferson's TimeWarp is intended to solve the problems of synchronizing
>>simulation time across multiprocessors in the operating system, rather
>>than in the user's code.  It is best suited to discrete-event simulations;
>>generally, the more compute time per event, the better your speedup.

>	It seems to me that a discrete-event simulator need not have
>the user concern himself with simulation time at all, other than to
>print out its value in reports partway through the simulation process.

     I am not aware of *any* simulation language which does not maintain
 a 'simulation time'.  sim_time is how the user ties his simulated 
 events into order, presumably based on their order in real time.  Simula,
 DEMOS, and GPSS all tie events to simulation time.
 Have a look at:
 L. Lamport: 'Time, clocks, and the ordering of events in a distributed 
 system'.  Communications of the ACM, 21(7), July 1978.
 X. Li and B. Unger: 'Languages for distributed simulation'.  AI and 
 Simulation, Society for Computer Simulation, San Diego, Jan 1987.

>The speed-up comes from assuming that one processor is given only one 
>simulator node to work with, is the impression that I get from reading
>some papers and talking with the people here, including one who gave one
>of the few pro-conservative-simulation results in the Miami conference
>on simulation (1989 Distributed Simulation Conference).

  Many papers on parallel simulations use the one process per processor
 model; this is to simplify the problem, allowing comparison of
 best-case speedups for all types of synchronization methods.
 No paper I am aware of claims that this would be an efficient
 configuration.
  You are also incorrect in deducing that Timewarp speedup comes from
 one process per processor; Timewarp out-performs conservative because:
    - it does not block processes for event-order verification.
    - conservative methods require fixed 'channels' between all processes
      which communicate in the simulation.  This is not easily modified,
      thus a large conservative simulation is *much* harder to modify or scale
      than one using Timewarp, which has no such restrictions.  
    - all channels also send a message at some pre-determined interval.
      This is used in event-order verification, and clogs your communication
      bandwidth.
    - in the case of lazy cancellation, 'getting lucky'.
 The explanation for the last point is longer than I care to type right now,
 if the paper references don't make it clear, I'll post an explanation
 later.
 O. Berry: 'Performance evaluation of the Time Warp Distributed Simulation
 Mechanism'.  PhD Thesis, U of Southern Cal, 1986
 O. Berry and G. Lomow: 'The potential speedup in the optimistic Time Warp 
 mechanism for distributed simulations'.  International conference on
 computer and applications, IEEE, Beijing, June 1987.

>>[my 20-line explanation of Timewarp deleted]
> [Joe's interpretation -- mostly correct -- deleted]
> 
>This entails saving all the state information and the timestamped event
>notifications, using a very large amount of memory if you wish to be sure
>that your simulation is valid.

    True, *but* the memory is not as large as you seem to think.  Because
 events are time-stamped, and the time at which each entity is at,
 the Timewarp kernel can determine what events and states *cannot*
 be rolled back to.  This allows garbage collection to occur, and the
 over-all memory usage is kept relatively small.  No question, however,
 that it is more memory than a conservative simulator would use.  Our
 contention is that memory is cheap, the average state-size is not immense,
 and Timewarp will run in small memory machines as well, albeit slower.
 A net gain in speed is what we are after, not small size.
    For basic information on garbage collection in Timewarp, refer to:
 D. Jefferson: 'Virtual Time'  ACM transactions on programming languages,
 7(3), July 1985.
 More work has been done on this:
 A. Gafni: 'Space management and cancellation mechanism for Time Warp'.  Phd
 thesis, U of South Cal, Dec 1985.
 Eith Gafni or Berry (ref. above) gives a proof that Timewarp requires
 no more than twice as much memory as conservative; I can't remember which.

> [billiards simulation description deleted]
>In my honest opinion,
>this `toy' shows the best way to use TimeWarp:  absolute minimal changes in
>state.  [...] I'm
>not so sure that this is a livable constraint for most simulations.  For 
>instance, gate-level VLSI simulations lose in TimeWarp.  [Not only that,
>but the potential parallelism would not be realized except minimally.]

    You are incorrect in assuming that minimal changes in state is the best
 utilization of Timewarp; it is strictly a method of synchronizing
 event-order across parallel hardware: why should minimal state-changes
 affect this?
     Also, billiards is not the best example for Timewarp; it has a very
 low granularity (although it can be higher, depending on the accuracy of
 your model).  But it is terrible for conservative methods, as a channel
 must exist between all pool balls!
     Regarding VLSI being a poor application for Timewarp: you're wrong
 again.  People in Montreal (I don't have a reference handy) are doing
 quite well with optimistic VLSI simulations.  There is nothing inherent
 in VLSI simulations which make them inapplicable for Timewarp.

>	The notion of one blocked process wasting processor time implies
>strongly that TimeWarp is based on a one process per node model of 
>computation.

     My fault; in an attempt to be brief, I only stated the standard
 method of comparing synchronization techniques.  Timewarp is in fact
 *not* based on that model, but does support multi-processes per
 processor.  Sim++, our Timewarp-based simulation language, uses lightweight
 processes (C++ objects from our tasking package), essentially unlimited 
 processes per processor.

>That constraint keeps potential parallelism potential, and as such is not
>of any interest to me.
    I don't understand; in my books, it *maximizes* attained parallelism,
 although it is by no means a cost-effective utilization of your processors.

>	Again, my honest opinion, TimeWarp seems to be doing half-measures in
>parallelizing a simulation:
> * If it needed to be parallel in only a few pieces, why not let the user
>	write the synchronization, or supply libraries to support it?  Surely
>	the interactions can be understood well enough at that level.

    Why write your 'for' or 'while' loops in C instead of assembler?
 Surely you understand what is going on at that level?  ***It is easier
 for the user to have the work done for him***.  You should use a programming
 abstraction applicable to the problem.  Porting to different architectures
 is an important factor as well.  Also, the synchronization is far more 
 complex than you seem to believe.
    If your simulation only requires sections to run in parallel, you
 only distribute those sections.  I fail to see how this is a failing
 of Timewarp.  In Sim++ (our specific implementation of Timewarp), no
 source-code need even address this 'problem'.  Via a configuration file,
 you simply specify at execution-time which processors to run your
 simulation objects on.

> * If it can parallelize to many pieces, why not go the whole hog and multi-
>	process on the different physical nodes [either truly, or via light-
>	weight processes]?  This hides the communication overhead, and answers
>	the objection about 'wasting processor time'.

     Covered above.  As a side note, do not assume that if two processes are
 on processor X, and process A has to rollback, that processing time
 has been wasted (i.e. process B could have run).  A Timewarp kernel
 runs processes which are furthest behind in simulation time first.  Thus,
 the chance of a rollback occurring which will affect the critical path
 of the simulation are less.  As an intersting side note, if all processes
 run on a single Timewarp kernel, execution is completely sequential;
 no rollbacks occur.
>
>>    I will go out on a limb and state that current research shows that
>>optimistic beats out conservative in the general case.
>
>	And I will go out on the other limb and state that current research
>I have seen, including Wen-King Su's paper (presented at Miami, titled
>"Variants of the Chandy-Misra-Bryant Distributed Discrete-Event Simulation
> Algorithm"), would demonstrate that conservative simulation is more
>widely usable than optimistic simulation, given a range of machine hardware,
>distributed environments, and simulations.
>	My caveats and speculations on this particular limb:
>C1)  I know of no paper directly comparing two different implementations of a
>	collection of simulations, or even a single simulation, where one is a
>	conservative and one an optimistic, with hard result such as run times
>	simulation sizes, data variation, usage of resources, and agreement of
>	results with a standard sequential implementation of that simulation.

    I don't have a full reference list handy, but you haven't been doing
 your research.  In Berry's thesis (ref. above), she proves that a simulation
 using Timewarp with lazy cancellation can theoretically execute in less
 time than the theoretical lower bound on execution time of *any* possible
 conservative mechanism.
 E. Leung: 'The effects of feedback on the performance of conservative
 algorithms', Distributed Simulation, 1989, The Society for Distributed
 Simulation. [does a comparison]
 R. Fujimoto: 'Performance measurements of distributed simulation 
 strategies'.  Distributed Simulation 1988, Society for computer simulation,
 San Diego, Feb 1988.
 F. Kaudel: 'A literature survey on distributed discrete event simulation'.
 Simuletter, 18(2), June 1987.
 B. Samadi: 'Distributed simulation, algorithms and performance analysis'.  
 Phd thesis, U of Cal, LA, 1985.
     A partial list only.  Most show that Timewarp is more
 effective than conservative over a wider range of applications.
 Note that I am *not* saying all cases.  Chandy-Misra technique is 
 applicable in some areas.  Some research has been done here on the
 feasibility of combining the two techniques.

>S1)  The more inherent parallelism a problem has, the less amiable it will be
>	to optimistic simulation due to each event-causing 'object' being an
>	independent source of events not in the predicted stream of a TimeWarp
>	process.

    I don't understand this point at all.  Timewarp does not 'predict'
 streams.  If you mean that having several event sources in a simulation
 (as opposed to few) will affect Timewarp, you are completely wrong.
 Timewarp does not even take note of which objects generate events.

>    TimeWarp might indeed be much better suited for shared-memory machines,

     What doesn't run faster on shared memory machines?  Chandy-Misra would
 run faster too.  In fact, you have it completely backwards; Chandy-Misra
 is probably better suited for shared memory, due to the constant need for
 information from other nodes.  Optimization of null-message transmission
 springs to mind (I haven't really thought about this one folks, so no
 promises).
     Richard Fujimoto at Utah has a research implementation of Timewarp 
 on a shared-memory multiprocessor; he has achieved 57 times speedup on 64 
 nodes.  Excellent results, but this does not mean Timewarp is only
 suited for shared memory machines.  I repeat, most distributed applications 
 run faster by using shared memory.
 R. Fujimoto: 'Timewarp on a shared memory multiprocessor'.  Currently
 a technical report, it will appear at ICCP in August.
     Some of our Sim++ users have obtained 75 times speedup on 40 processors
 with a high granularity; 25 times speedup on 40 processors with medium 
 granularity.  [Yes, it's super-linear; that topic has been beaten to 
 death lately, I'm not going to address it here]
 
> [Request for references deleted]
	Here's a few more not listed above which are of interest.
 D. Jefferson, B. Beckman, et al: 'Distributed simulation and the Time Warp 
 operating system'.  Operating systems review, 21(5), Nov 1987.
 D. Jefferson and H. Sowizral: 'Fast concurrent simulation using the Time 
 Warp mechanism'.  Distributed Simulation 1985, Society for computer 
 simulation, San Diego, Jan 1985.
 G. Lomow, B. Unger, et al: 'A performance study of Time Warp'.  Distributed
 Simulation 1988, Society for computer simulation, San Diego, Feb 1988.

>
>PS Are there any good grad schools out there with professors interested in
>	simulation in general, preferably with a Software Engineering option
>	and no firm entrenchment of either optimistic or conservative
>	simulation?  I'm considering what I want to do with my next few years. :-)

    The University of Calgary does work in distributed systems, not all
 Timewarp.  U of Southern Cal is a hotbed too.  Richard Fujimoto at Utah
 does work with no particular bias.  I can't bring myself to recommend a 
 conservative school though :-)

>-- 
>Joe Beckenbach			   |	
>jerbil@csvax.caltech.edu	   |	This plot for cultivation of
>Caltech 256-80, Pasadena CA 91125  |	a fertile imagination.

Larry Mellon <jsimellon@cpsc.ucalgary.ca>
Jade Simulations International (403) 282-5711

dor@lanl.gov (David Rich) (05/31/89)

[ lot's of stuff deleted ]
>  What doesn't run faster on shared memory machines?  Chandy-Misra would
>  run faster too.
[...]
>  Optimization of null-message transmission springs to mind (I haven't
>  really thought about this one folks, so no promises).
[ more deleted ]
  
Techniques for Efficient Shared-Memory Parallel Simulation
David B. Wagner, Edward D. Lazowska, Brain N. Bershad
Dept. of Computer Science
Univ. of Washington
Tech Rep 88-04-05 (August 1988)
  
--Dave