[comp.parallel] fine-grained and/or multiway optimi

wilson@uicbert.eecs.uic.edu (11/08/88)

I am looking for pointers to research on optimistic computation,
especially fine-grained optimism and multiway optimism.

My impression is that most of the work in optimistic parallelism is
either very loosely-coupled (e.g., distributed simulation) or massively
parallel (e.g., connectionist).  There's also some optimism 
in the pipelining of processors.

Questions:

   1.  Is anybody doing tightly-coupled multiprocessor optimism?  If not,
       is there some reason for it?

   2.  Are there any explicitly optimistic programming systems, with
       language constructs that specify/allow optimistic execution?

   3.  How promising are optimistic techniques, really?  They would
       *seem*, to a nonexpert like me, to be the only way around
       Amdahl's law without redesigning all of our algorithms to
       have tremendous parallelism, which is difficult.

   4.  Is anybody doing multiway optimism, for example computing
       all possible branches of a conditional in parallel with
       evaluation of the condition?  (When you finally know the
       value of the condition, you just keep the appropriate
       consequent, already computed, and discard the others.)

   5.  Do any optimistic systems let you play with a general heap,
       or do they impose major restrictions (like all data being
       allocated on a stack, or requiring that the compiler be
       able to determine statically when something will become
       garbage)?

My own interest in this stems partly from the fact that I have devised a
couple of checkpointing and recovery mechanisms that efficiently checkpoint
heaps.  One of them is good for allowing reversion to all past states, with
the cost being quite reasonable in space and time, but smallest for recently-
accessed states.  The other supports having several "current" states, with
each being an efficient virtual copy of a parent state.

The former system should allow optimistic systems like Time Warp to use a
general heap efficiently, while the latter should allow multiway, fine-grained
optimism on multiprocessors.

I actually came up with these things for quite different reasons, but I am
curious how applicable they are to optimistic systems.  Any opinions on
these issues would be welcome.

Thanks prematurely,

Paul R. Wilson                         
Human-Computer Interaction Laboratory
U. of Illin. at C. EECS Dept. (M/C 154)   wilson%uicbert@uxc.cso.uiuc.edu
Box 4348   Chicago,IL 60680 

khb@Sun.COM (Keith Bierman - Sun Tactical Engineering) (11/09/88)

In article <3465@hubcap.UUCP> wilson@uicbert.eecs.uic.edu writes:
>I am looking for pointers to research on optimistic computation,
>especially fine-grained optimism and multiway optimism.
>
>My impression is that most of the work in optimistic parallelism is
>either very loosely-coupled (e.g., distributed simulation) or massively
>parallel (e.g., connectionist).  There's also some optimism 
>in the pipelining of processors.
>
>Questions:
>
>   1.  Is anybody doing tightly-coupled multiprocessor optimism?  If not,
>       is there some reason for it?

For very tightly coupled processors it might not make much sense. Very
optimistic systems will probably be loosely coupled.

>
>   2.  Are there any explicitly optimistic programming systems, with
>       language constructs that specify/allow optimistic execution?

TimeWarp the first operating system with virtual time. Contact Brian
Beckman at JPL, or look into back issues of CACM; the idea was
formulated by a UCLA professor (still there, I think), the first
hypercube implementation was done by Brian and some folks reporting to
him (mostly him, I think). There are others working on this for
networks and such. At one time there was a Mac simulator, a Sun
network implementation and a JPL Mark II hypercube implementation. I
don't know the projects current status.

>
>   3.  How promising are optimistic techniques, really?  They would
>       *seem*, to a nonexpert like me, to be the only way around
>       Amdahl's law without redesigning all of our algorithms to
>       have tremendous parallelism, which is difficult.

These don't always make the rewrite easier. There was a paper
presented at a SIGMETRICS or two ago which purported to prove that
parallelism didn't work. The fellow worked for IBM, and strangely
enough the ruling committee really like the paper (gave it an honor).
Brian's results are extremely encouraging (as of a couple of years
ago). They built discrete simulations which worked. Got the speedup,
not infinite time loops. What more can we ask of a research project ?
>
>   4.  Is anybody doing multiway optimism, for example computing
>       all possible branches of a conditional in parallel with
>       evaluation of the condition?  (When you finally know the
>       value of the condition, you just keep the appropriate
>       consequent, already computed, and discard the others.)

Cydra 5 directed dataflow. Cydrome is fast becoming history but the
machine was a VLIW with lot's of attractive features (and it worked).
But it takes more to be a sucessful company.

>
>   5.  Do any optimistic systems let you play with a general heap,
>       or do they impose major restrictions (like all data being
>       allocated on a stack, or requiring that the compiler be
>       able to determine statically when something will become
>       garbage)?

TimeWarp is (in principle) object based, and built upon message
passing. So, IP, the programmer shouldn't know about the stack(s) etc.
Current reality is probably more primative, but it's research.
>
>My own interest in this stems partly from the fact that I have devised a
>couple of checkpointing and recovery mechanisms that efficiently checkpoint
>heaps.  One of them is good for allowing reversion to all past states, with
>the cost being quite reasonable in space and time, but smallest for recently-
>accessed states.  The other supports having several "current" states, with
>each being an efficient virtual copy of a parent state.

Good start.
>
>The former system should allow optimistic systems like Time Warp to use a
>general heap efficiently, while the latter should allow multiway, fine-grained
>optimism on multiprocessors.

Woops, just got this far. If you know about TW, why the questions ?
>
>I actually came up with these things for quite different reasons, but I am
>curious how applicable they are to optimistic systems.  Any opinions on
>these issues would be welcome.
>

The fundamental practial engineering questions are: what is the cost of
doing extra work ? what is the probability of success (i.e. doing
useful work) . For a tightly coupled system there are typically lots
of ways to use pessimitic scheduling and rely on other ways to get
speed (until you hit Cray speeds). Loosely coupled arrays seem more
amenable to these techniques.



Keith H. Bierman
It's Not My Fault ---- I Voted for Bill & Opus

kong@CS.UCLA.EDU (Kong Li) (11/14/88)

In article <3485@hubcap.UUCP> khb@Sun.COM (Keith Bierman - Sun Tactical Engineering) writes:
>In article <3465@hubcap.UUCP> wilson@uicbert.eecs.uic.edu writes:
>>   2.  Are there any explicitly optimistic programming systems, with
>>       language constructs that specify/allow optimistic execution?
>
>TimeWarp the first operating system with virtual time. Contact Brian
>Beckman at JPL, or look into back issues of CACM; the idea was
>formulated by a UCLA professor (still there, I think), the first

Yes, he is here. His name is David Jefferson. A co-inventor of the
Time Warp mechanism is Henry Sowizal.

>hypercube implementation was done by Brian and some folks reporting to
>him (mostly him, I think). There are others working on this for
>networks and such. At one time there was a Mac simulator, a Sun
>network implementation and a JPL Mark II hypercube implementation. I
>don't know the projects current status.

The Time Warp is already implemented on a JPL Mark III hypercube. For
current status, have a look at "Distributed Simulation and the Time Warp
Operating System" by David Jefferson, Brian Beckman, et al. UCLA CSD
Tech. Rept. CSD-870042, Aug. 1987. That paper is also available in a
recent conference. But I forgot which one it is.

'Kong
(kong@cs.ucla.edu)

reed@m.cs.uiuc.edu (11/21/88)

I'd like to clarify a misconception:

>khb@sum.com writes:

>>In article <3465@hubcap.UUCP> wilson@uicbert.eecs.uic.edu writes:
>>
>>   3.  How promising are optimistic techniques, really?  They would
>>       *seem*, to a nonexpert like me, to be the only way around
>>       Amdahl's law without redesigning all of our algorithms to
>>       have tremendous parallelism, which is difficult.

> These don't always make the rewrite easier. There was a paper
> presented at a SIGMETRICS or two ago which purported to prove that
> parallelism didn't work. The fellow worked for IBM, and strangely
> enough the ruling committee really like the paper (gave it an honor).
> Brian's results are extremely encouraging (as of a couple of years
> ago). They built discrete simulations which worked. Got the speedup,
> not infinite time loops. What more can we ask of a research project ?



The paper to which you refer is more accessible as:

	D. A. Reed, A. D. Malony, and B. D. McCredie, "Parallel
	Discrete Event Simulation Using Shared Memory," IEEE
	Transactions on Software Engineering, Vol. 14, No. 4,
	April 1988, pp. 541-553.

The paper considers the performance of ONE implementation of
*conservative* distributed discrete event simulation.  Unlike
optimistic strategies (like Time Warp), conservative strategies advance
the local clocks only when conditions insure that they will never be
rolled back (i.e., all clock values must be monotonically increasing).

In particular, the paper considers the general case when the underlying
parallel simulation support has no knowledge of the structure of the
simulation model.  By comparison, this is similar to most sequential
simulation systems -- the underlying process and event list management
are oblivious to the structure or semantics of the simulation model.
Why take this approach?  Well, if it works, it provides the underlying
support for "automatic" parallelism.  By that, I mean that a
traditional process-oriented simulation could be supported by a
conservative distributed simulation mechanism, realizing the
parallelism in the simulation model without requiring the simulation
programmer to consider two levels of parallelism -- the one in the
model and the model structure necessary to achieve the simulated
parallelism.

The argument for this approach is purely one of software engineering.
Discrete event simulations most often model systems with some
parallelism and these simulations are often written using
process-oriented simulation languages (i.e., parallel programming
languages).  If the simulation programmer must also provide hints to
the underlying simulation support about how to parallelize the
process-oriented program, the programming burden grows.  Ideally, the
simulation support should automatically realize the parallelism in the
simulation model.  Finally, simulation programs are often changed as
one explores structural variations of the simulated system.  One would
like to separate the realization of parallelism from the goal of
simulation -- exploring system variations.

The paper considers simulation of queueing network models on a Sequent
Balance (bus-based shared memory system) running Unix.  The implemented
conservative simulation strategy uses no knowledge about the simulation
semantics (e.g., queueing strategies or service time distributions).
The servers of the queueing network simply exchange messages and
execute in parallel.  In the conservative strategy used, networks with
cycles can deadlock when trying to insure monotonic clocks.  Two
mechanisms were used:  deadlock avoidance and deadlock detection and
recovery.  The performance measurements showed that speedup was
strongly dependent on the structure of the simulated network.  This is
not surprising; it is analogous to saying that speedup in parallel
programs depends on the structure of the program.   In our
implementation networks with cycles often suffered from sizable
overheads due to deadlock processing.  Much of this is unnecessary; it
is protecting against possible causality violations that never occur.
As an aside, this is one argument for optimistic techniques.
Unfortunately, most interesting models contain cyclic interactions
among the model components.  To achieve good speedup in these cases
using conservative strategies, it seems likely that some knowledge
about the simulated system must be used.  This is the motivation
for recent theoretical work on conditional knowledge (by Chandy
and others).

To summarize, the Reed/Malony/McCredie paper presents experimental
evidence that the performance of conservative simulation
implementations (of queueing network models) that do not rely on knowledge
about the simulated system often yield limited speedups; the performance of
individual cases depends on their structure.

More recent experimental studies by Fujimoto (Utah) and Lazowska
(Washington) have shown that taking advantage of knowledge about the
model (.e.g., knowing that queues are FIFO) can significantly reduce
the overhead for conservative simulation and yield good speedups (with
accompanying loss of generality).  This approach is highly promising
and I am greatly impressed by the quality of their work. 
As an example, Lazowska and his students have implemented an object-oriented
support package that provides much of the needed information
automatically for the classes in the library (as one example, FIFO
queues).

The balance between simulation programming to express the semantics of
the simulated system and to specify the information needed to
parallelize the simulation is important.  Just as in parallel
programming, the goal is results, not parallelism.  If achieving
parallelism is difficult, time consuming, and overly sensitive to the
model structure, the already difficult problems of simulation
programming will become even more challenging.  For conservative
strategies, I believe, and this is my personal opinion unsupported by
any hard evidence, the best solution is support tools that automatically
obtain the bulk of the information needed to parallelize.  This information
may be obtained by some combination of language structure (e.g.,
declarations), object-oriented libraries (e.g., FIFO queues), or compilers.

There are lots of interesting, open questions in distributed
simulation, both conservative and optimistic.  The next few years will
tell the tale.


Dan Reed
Department of Computer Science
University of Illinois
Urbana, Illinois  61801

reed@a.cs.uiuc.edu

jevans@.ucalgary.ca (David Jevans) (11/21/88)

A good article about Virtual Time and Time Warp is:

Jefferson D,
"Virtual Time",
ACM Transactions on Programming Languages and Systems,
July 1985.

When not working on my Msc, I work for Jade Simulations International.
We have developed a Time Warp system for networks of SUNs and
the BBN Butterfly.  Discrete event simulations written in Sim++,
a superset of C++, can be run on 1 to 128 processors without
sacrificing determinism.

If you are interested in Time Warp, see us at the SCS conference
in San Diego, December 12, 13, 14, 1988.

David Jevans, U of Calgary Computer Science, Calgary AB  T2N 1N4  Canada
uucp: ...{ubc-cs,utai,alberta}!calgary!jevans
David Jevans, U of Calgary Computer Science, Calgary AB  T2N 1N4  Canada
uucp: ...{ubc-cs,utai,alberta}!calgary!jevans