[comp.parallel] distributed simulation

usha%FSU.BITNET@CUNYVM.CUNY.EDU (04/15/89)

Hi

The following is a list of references on distributed simulation.  But
this has not been updated for the past two years.  David Jefferson has a
paper on 'Virtual Time'.  The following list may not be very readable;
sorry about that.  It came out of a survey paper that I wrote on
distributed simulation but never got to publish.  I used TeX to generate
the paper and hence it has the macro calls such as \createrefcode, \it
etc.  Hope you find it useful.

Usha Chandra
--------------------------------------------------------------------------
References:
-----------
\createrefcode{adaref}{Ada 1983}
{{\it Ada Programming Language\/} 1983. Military
Standard, ANSI/MIL-STD-1815A.}

%\createrefcode{adels82}{Adelsberger 1982}
%{Adelsberger, H.H. 1982.  ASSE-Ada simulation support
%environment. In {\it Proc.~of\ the\ 1982\ Winter\ Simulation\
%Conference\/}, (San Diego, California, 1982), pp. 89--101.}
%
%\createrefcode{adelsa83}
%{Adelsberger 1983a}{Adelsberger, H.H. 1983.
% A structured and modular
%approach to
%transaction flow models. In {\it Proc.\ of the 1983 Summer Computer
%Simulation Conference\/}, (Vancouver, B.C., Canada, 1983).}
%
%\createrefcode{adelsb83}
%{Adelsberger 1983b}
%{Adelsberger, H.H. 1983.  Modelling and simulation
%in Ada.
%In {\it Proc. of the First\ European\ Simulation\ Congress\ ESC8\/},
%(Berlin, 1983). pp. 273-280.}
%
%\createrefcode{adelsc83}{Adelsberger 1983c}
%{Adelsberger, H.H. 1983.  Interactive
%modeling and simulation
%of transaction flow or network models using the Ada simulation
%support environment. In {\it Proc.\ of\ the\ 1983\ Winter\ Simulation
%\ Conference\/}, (Washington, 1983), pp. 561--570.}
%
%\createrefcode{bagrodia83}
%{Bagrodia 1983}{Bagrodia, R. 1983.  MAY : a process
%based simulation
%language. Master Report, Dept.~of Computer Sciences, University
%of Texas at Austin, Austin, Texas.}
%
%\createrefcode{bagrodiaa84}
%{Bagrodia et al. 1984a}{Bagrodia, R., K.M. Chandy, and J. Misra
%1984.
% A message based
%approach to discrete event simulation. {\it Report TR LCS--8403\/},
%Dept.~of Computer Sciences, University of Texas at Austin, May.}
%
\createrefcode{bagrodiab84}
{Bagrodia et al. 1984b}
{Bagrodia, R., K.M. Chandy, and J. Misra, 1984.
 MAY : a message
based simulation language for small computers. In {\it Proc.~of the
Small Computers (R)Evolution\/} (September). pp. 148--153.}

\createrefcode{baik85}
{Baik and Zeigler 1985}
{Baik, D.K. and
B.P. Zeigler 1985. Performance evaluation
of hierarchical distributed simulations. In {\it Proceedings of the 1985
Winter Simulation Conference\/} (San Francisco, California, December
11--13, 1985). pp. 421--427.}

%\createrefcode{banks84}
%{Banks and Carson 1984}
%{Banks, J. and
%J.S. Carson II 1984. {\it Discrete-Event
%System Simulation\/}, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.}

\createrefcode{berry85}
{Berry and Jefferson 1985}
{Berry, O. and D.R. Jefferson 1985.  Critical
path analysis of distributed  simulation.
In {\it Proceedings of the Conference on
Distributed Simulation 1985,  The  1985
SCS  Multiconference\/} (San  Diego,
California, January 1985). 15, 2, pp. 57--60.}

%\createrefcode{braun83}
%{Braun 1983}
%{Braun, J., Ed. 1983. {\it SIMSCRIPT II.5 Handbook\/},
%C.A.C.I.}
%
%\createrefcode{brunoa84}
%{Bruno 1984a}
%{Bruno, G. 1983.  Rationale for the introduction of
%discrete simulation primitives in Ada. In {\it Proc.~of the Conference
%on Simulation in Strongly Typed Languages\/} (February 2--4, 1984). 13,
%2, Simulation Series, pp. 10--15.}
%
%\createrefcode{brunob84}
%{Bruno 1984b}
%{Bruno, G. 1984.  Using Ada for discrete event
%simulation.
%{\it Software-Practice and Experience 14\/}, 7(July), 685--695.}

\createrefcode{bryant77}
{Bryant 1977}
{Bryant, R.E. 1977.  Simulation
of packet communication
architecture computer \break systems. {M.S. Thesis,
MIT/LCS/TR-188\/},
Massachusetts Institute of Technology, Cambridge, Massachusetts,
November 1977.}

\createrefcode{bryantrm81}
{Bryant 1981}
{Bryant, R.M. 1981. {\it SIMPAS User's Manual\/}. Dept.
of Computer Science and Academic Computing Center, University of
Wisconsin-Madison, Madison, W.I.}


%\createrefcode{carver79}{Carver 1979}
%{Carver, M.B. (1979) The FORSIM VI distributed system
%simulation package. In {\it  Methodology in System
%Modelling and Simulation, Zeigler et al. Eds\/}.,
% North Holland Publishing Company,
%Amsterdam.}

\createrefcode{chandra86}
{Chandra 1986}
{Chandra, U. 1986. The design of a
distributed concurrent simulation environment. {\it Dissertation\/},
Department of Computer Science, Texas A \& M University, August 1986.}

\createrefcode{chandras85}
{Chandra and Sheppard 1986a}
{Chandra, U. and S. Sheppard 1986.
Implementation and analysis of random variate generators in Ada.
{\it Journal of Pascal, Ada, \& Modula-2 4\/}, 5(July/August).}

%\createrefcode{chandrase85}
%{Chandra and Sheppard 1986b}
%{Chandra, U. and S. Sheppard 1986. An
%algorithm for distributed concurrent simulation. {\it Submitted for
%publication\/}, Texas A \& M University, College Station, Texas.}


\createrefcode{chan79}
{Chandy and Misra 1979}
{Chandy, K.M. and
J. Misra 1979. Distributed simulation: a
case study in design and verification of distributed programs.  {\it
IEEE Transactions on Software Engineering SE-5\/}, 5 (September).}

\createrefcode{chand79}
{Chandy et al. 1979}
{Chandy, K.M., J. Misra, and
V. Holmes 1979. Distributed
simulation of networks.  {\it Computer Networks 3\/},
(January 1979),
105--113.}

\createrefcode{chandy81}
{Chandy and Misra 1981}
{Chandy, K.M. and
J. Misra 1981.
Asynchronous distributed
simulation via a sequence of parallel computations.
{\it Communications of the
ACM 24\/}, 11(April), 198--206.}

\createrefcode{chandyk82}
{Chandy and Misra 1982}
{Chandy, K.M. and J. Misra 1982.  A distributed
algorithm
for detecting deadlocks in distributed systems. In {\it
Proc. of ACM SIGACT-SIGOPS
Symposium on Principles of Distributed Computing\/} (Ottawa, Canada,
August 1982). pp. 157--164.}

\createrefcode{chandykm83}
{Chandy and Misra 1983}
{Chandy, K.M. and
J. Misra 1983. Distributed deadlock
detection.
 {\it ACM Transactions on Computer Systems 1\/}, 2 (May 1983),
144--156.}

%\createrefcode{christopher82}
%{Christopher et al. 1982}
%{Christopher, T., M. Evens,
%R.R. Gargeya, and
%T. Leonhardt 1982.  Structure of a distributed simulation
%system. In {\it Proc. of the 3rd International Conference
%on Distributed Computing
%Systems\/} (Miami/Ft.Lauderdale, Florida, 1982). pp. 584--589.}


\createrefcode{comfort79}
{Comfort 1979}
{Comfort, J.C. 1979.  A taxonomy
and analysis of event set
management algorithms for discrete event simulation. In {\it Record of
Proceedings of the 12th Annual Simulation Symposium\/} (Florida,
1979). pp. 115--146.}

\createrefcode{comforta81}
{Comfort 1981}
{Comfort, J.C. 1981. The simulation of a
microprocessor based event-set processor. In {\it Proc. of the 14th
Annual Simulation Symposium\/} (Tampa, Florida, March 1981).
pp.17--33.}

\createrefcode{comfortb81}
{Comfort and Miller 1981a}
{Comfort, J.C. and A. Miller 1981.
 The simulation of a
pipelined event set processor. In {\it Proc.~of the 1981 Winter
Simulation Conference\/} (Atlanta, Georgia, December 1981). pp. 591--597.}

\createrefcode{comfortc81}
{Comfort and Miller 1981b}
{Comfort, J.C. and
A. Miller 1981.  Considerations
in the design
of  a  multi microprocessor based simulation computer.  In {\it Modeling
and Simulation on Microcomputers,  L.A. Leventhal Ed\/}.
pp. 110--112.}

\createrefcode{comfortj82}
{Comfort 1982}
{Comfort, J.C. 1982.  The design
of a multi-microprocessor
based simulation
computer-I. {\it In Record of Proceedings of the
15th Annual Simulation
Symposium\/} (Tampa, Florida, March 1982). pp. 45--53.}

\createrefcode{comfortjc83}
{Comfort 1983}
{Comfort, J.C. 1983.
The design of a multi-microprocessor based simulation
computer-II. In {\it Record of Proceedings of the 16th
Annual Simulation Symposium\/} (Tampa, Florida, 1983). pp. 197--209.}

\createrefcode{comfortjca84}
{Comfort et al. 1984}
{Comfort, J.C., Y. Winquing, and
Qiang Li 1984.
 The design of
a multi-microprocessor based simulation computer-III. In
{\it Record of
Proceedings of the 17th Annual Simulation Symposium\/} (Tampa,
Florida, March 1984). pp. 227--241.}

\createrefcode{comfortjcb84}
{Comfort 1984}
{Comfort, J.C. 1984. The simulation
of a master-slave
event-set processor. {\it Simulation 42\/}, 3 (March), 117--124.}

\createrefcode{concepcion85}
{Concepcion 1985a}
{Concepcion, A.I. 1985. Mapping distributed
simulators onto the hierarchical multi-bus multiprocessor
architecture. In {\it Proceedings of the Conference on
Distributed Simulation 1985, The 1985 SCS
Multiconference\/} (San Diego, California, January 1985). 15, 2, pp.
8--13.}

\createrefcode{concepcionb85}
{Concepcion 1985b}
{Concepcion, A.I. 1985. Implementation of the
abstract simulator on the HEP computer.
In {\it Proceedings of the 1985
Winter Simulation Conference\/} (San Francisco, California, December
11--13, 1985). pp. 428--434.}


\createrefcode{david83}
{Davidson and Reynolds 1983}
{Davidson, D.L. and
P.F. Reynolds 1983.  Performance
analysis
of a distributed simulation algorithm based on active logical
processes. In {\it Proc.~of the 1983 Winter Simulation Conference\/}
(December 1983). pp. 267--268.}

\createrefcode{davidson84}
{Davidson 1984}
{Davidson, D.L. 1984.   A distributed simulation
implementation. {\it M.S.~Thesis\/}, The University of
Virginia, January 1984.}

\createrefcode{davisdiss88}
{Davis 1988}
{Davis, C. K. 1988. Automatic simulation model development for a
multiprocessor architecture, {\it Dissertation\/}, Dept. of
Computer Science, Texas A \& M University, August 1988.}

\createrefcode{friel84}
{Friel and Sheppard 1984}
{Friel, P. and
S. Sheppard 1984. Implications of the
Ada environment for simulation studies. In {\it Proceedings of the 1984
Winter Simulation Conference\/} (Dallas, Texas, December 1984).
pp. 270--277.
}

\createrefcode{fujimotods88}
{Fujimoto 1988} {Fujimoto, R.M. 1988. Performance measurements of distributed
simulation strategies. In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 14--20.
}

\createrefcode{fujimotodsb88}
{Fujimoto et al. 1988}
{Fujimoto, R.M., Tsai, Jya-Jang and Gopalakrishnan, G. 1988.
The roll back chip: Hardware support for distributed simulation
using time warp. In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 81--86.
}

%\createrefcode{georgiadis81}{Georgiadis et al. 1981}
%{Georgiadis, P.G., M.P. Papazouglou,
%and D.G. Maritsas,
% Towards a parallel SIMULA machine. In {\it Proceedings
%of the 8th Annual Symposium on
%Computer Architecture\/}, (May 1981). pp. 263--278.}


%\createrefcode{gpss69}{GPSS 1969}
%{IBM
%Application Program GH20-0304-4 (1969),
%{\it GPSS/360 Introductory User's Manual\/}, November.}


%\createrefcode{hooper82}
%{Hooper and Reiley 1982}
%{Hooper, J.W. and
%K.D. Reiley 1982. An algorithmic
%analysis of simulation strategies. {\it International Journal of
%Computer and Information Sciences 11\/}, 2, 101--122.}

%\createrefcode{hooperj83}
%{Hooper 1983}
%{Hooper, J.W. 1983.The effects of strategy on
%simulation language characteristics. In {\it Proceedings of the 1983
%Summer Computer Simulation Conference\/} (Vancouver, B.C., Canada, 1983).
%pp. 48--53.}

\createrefcode{gafnids88}
{Gafni 1988}
{Gafni, A. 1988.  Performance measurements of distributed
simulation strategies. In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 61--67.
}

\createrefcode{hugheswsc87}
{Hughes et al. 1987.}
{Hughes, C., S.V. Sheppard, and U. Chandra 1987.  Two
implementations of a concurrent simulation environment. In {\it
Proceedings of the 1987 Winter Simulation Conference\/}
(Atlanta, Georgia, December 1987). pp. ???--???.}

\createrefcode{jefferson82}
{Jefferson and Sowizral 1982}
{Jefferson, D.R. and H. Sowizral 1982.
 Fast concurrent
simulation using time warp mechanism, Part I: Local control.
{\it RAND Note No.~N--1906--AF\/}, RAND Corporation, Santa Monica,
California, December 1982.}

\createrefcode{jeffersona85}
{Jefferson and Sowizral 1985}
{Jefferson, D.R. and H. Sowizral 1985.
 Fast concurrent
simulation using the time warp mechanism. In {\it Proceedings
of the Conference on Distributed Simulation
1985, The 1985 SCS Multiconference\/} (San Diego, California, January
1985). 15, 2, pp. 63--69.}

\createrefcode{jeffersonb85}
{Jefferson et al. 1985}
{Jefferson, D.R. et al. 1985.
 Implementation of time warp
on the Caltech hypercube. In {\it Proceedings of the Conference on
Distributed Simulation 1985, The 1985
SCS Multiconference\/} (San Diego, California, January 1985).
15, 2, pp. 70--75.}

\createrefcode{jeffersonc85}
{Jefferson 1985}
{Jefferson, D.R. 1985. Virtual time. {\it ACM
Transactions on Programming Languages 7\/}, 3 (July), 404--425.}

%\createrefcode{jones80}
%{Jones and Schwarz 1980}
%{Jones, A.K. and
%P. Schwarz 1980.  Experience using
%multiprocessor systems - a status report. {\it Computing Surveys\/},
%12, 2 (June 1980), 121--165.}

%\createrefcode{kiviat71}
%{Kiviat 1971}
%{Kiviat, P.J. (1971), Simulation languages. {\it On
%Computer Simulation Experiments with Models of Economic Systems. T.H.
%Naylor, Ed\/}, John Wiley \& Sons, New York, 406--489.}

%\createrefcode{klahr82}
%{Klahr 1982}
%{Klahr, P. 1982.  Overview
%of the ROSS simulation system.
%In {\it Proc.~of the 10th IMACS World Congress on System Simulation and
%%Scientific Computation\/} (August 1982). pp. 67--69.}
%
%
%\createrefcode{krishnamurthi84}
%{Krishnamurthi and Young 1984}
%{Krishnamurthi, M. and
%R.E. Young 1984.
% A multitasking
%implementation of system simulation : The emulation of an
%asynchronous parallel processor for system simulation using a
%single processor. In {\it Proc.~of the 1984 Winter Simulation
%Conference\/} (Dallas, Texas, November 1984). pp. 261--271.}

\createrefcode{krishnamurthim85}
{Krishnamurthi et al. 1985}
{Krishnamurthi, M., U. Chandra,
and S. Sheppard 1985. Two appproaches to the
implementation of a distributed
simulation system. In {\it Proceedings of the 1985 Winter Simulation
Conference\/} (San Francisco, California, December
11--13, 1985). pp. 435--443.}

\createrefcode{lamport78}
{Lamport 1978}
{Lamport, L. 1978. Time, clocks and
the ordering of events in a distributed system. {\it Communications of the
ACM 21\/}, 7 (July), 558--565.}

\createrefcode{lomowds88}
{Lomow et al. 1988}
{Lomow, G. et al. 1988.  A performance study of the time warp.
In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 50--55.
}

%\createrefcode{mcdonald80}{McDonald et al. 1980}
%{McDonald, W.C., H.O. Welch, and
%J.D. Reynolds,
% Distributed simulation host: A distributed test bed.
%In {\it Proc.~of
%the COMPSAC 80\/}, (October 1980). pp. 562--568.}
%
%
%\createrefcode{misra84}
%{Misra 1984}
%{Misra, J. 1984.  Distributed
%simulation. {\it Tutorial presented in the Fourth
%International Conference on Distributed Computing Systems\/}, IEEE,
%San Francisco, California, May.}
%
\createrefcode{misraj86}
{Misra 1986}
{Misra, J. 1986.  Distributed Discrete-Event Simulation.
{\it Computing Surveys\/}, 18, 1, 39--65.}

\createrefcode{nicol84}
{Nicol and Reynolds 1984}
{Nicol, D.M. and
P.F. Reynolds 1984.  Problem
oriented protocol
design. In {\it Proc.~of the 1984 Winter Simulation Conference\/} (Dallas,
Texas, November 1984). pp. 471--474.}

\createrefcode{nicold85}
{Nicol and Reynolds 1985a}
{Nicol, D.M. and
P.F. Reynolds 1985.  A statistical
approach to
dynamic partitioning. In {\it Proceedings of the Conference on
Distributed Simulation 1985, The 1985 SCS
Multiconference\/} (San Diego, California, January 1985). 15, 2,
pp. 53--56.}

\createrefcode{nicoldm85}
{Nicol and Reynolds 1985b}
{Nicol, D.M. and
P.F. Reynolds 1985. An
operational repartitioning policy. In {\it Proceedings of the 1985
Winter Simulation Conference\/} (San Francisco, California, December
11--13, 1985). pp. 493--497.}

\createrefcode{nicolds88}
{Nicol 1988}
{Nicol, D.M. 1988.  Mapping a battlefield simulation onto
message passing parallel architectures.  In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 141--146.
}


%\createrefcode{ogrady79}{O'Grady  1979}
%{O'Grady, E.P. 1979.
% Interprocessor communication in
%multiprocessor simulation systems. In {\it Proc.~of the FALL COMPCON
%1979}, pp. 300--306.}
%
%\createrefcode{ogradye83}{O'Grady and Wang 1983}
%{O'Grady, E.P., and
%C.H. Wang,  Multi-bus
%based parallel
%processor for system simulation. In {\it Proc.~of the 1983 Summer
%Computer Simulation Conference\/}, (Vancouver, B.C., Canada, July 1983).
%pp. 371--375.}


\createrefcode{ohalloron83}
{O'Halloron 1983}
{O'Halloron, D.R. 1983.  Analysis of a model for
distributed
simulation, {M.S. Thesis\/}, The University of Virginia,
Charlottesville, Virginia, January.
}

%\createrefcode{papazaglou81}{Papazaglou et al. 1981}
% {Papazaglou, M.P., P.I. Georgiadis,
% and D.G. Maritsas,
% Process synchronization in the parallel SIMULA
%machine. In {\it Proc.~
%of the International Conference on Parallel Processing\/}, (August 1981).
%pp. 297--299.}
%
%\createrefcode{papazagloum84}{Papazaglou et al. 1984}
%{Papazaglou, M.P., P.I. Georgiadis,
%and D.G. Maritsas,
%Architectural Considerations of the parallel SIMULA machine.
%{\it The Computer Journal\/}, 27, 3, 254--259.}
%

\createrefcode{peacock79}
{Peacock et al. 1979}
{Peacock, J.K., J.W. Wong, and E. Manning 1979.
Distributed
simulation using a network of micro-processors. {\it Computer
Networks 3\/}, 1 (February), 44--56.}

\createrefcode{peacockj80}
{Peacock et al. 1980}
{Peacock, J.K., E. Manning, and
J.W. Wong 1980.
Synchronization of distributed simulation using broadcast
algorithms. {\it Computer Networks 4\/}, 1 (February), 3--10.}

%\createrefcode{pritsker79}
%{Pritsker 1979}
%{Pritsker, A.B. 1979. {\it The GASPIV
%Simulation Language\/}. John Wiley \& Sons, New York, N.Y.}

\createrefcode{reedds88}
{Reed and Malony. 1988}
{Reed, D.A. and A.D. Malony 1988.  Parallel discrete event
simulation: The Chandy-Misra approach. In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 8--13.
}

\createrefcode{reed88}
{Reed et al. 1988}
{Reed, D.A., A.D. Malony, and B.D. McCredie 1988.  Parallel
discrete simulation using shared memory. {\it IEEE Transactions
on Software Engineering\/}, 14, 4 (April), 541--553.
}


\createrefcode{reynolds82}
{Reynolds 1982}
{Reynolds, P.F. 1982.  A shared resource
algorithm for
distributed simulation. In {\it Proc.~of the 9th Annual Symposium on
Computer Architecture\/} (Austin, Texas, April 1982). pp. 259--266.}

\createrefcode{reynoldsa83}
{Reynolds 1983a}
{Reynolds, P.F. 1983.  The active
logical process approach to
distributed simulation. DAMACS Technical Report \#83--12, Dept. of
Applied Math. and Computer Science, The University of Virginia,
Charlottesville, Virginia.}

\createrefcode{reynoldsb83}
{Reynolds 1983b}
{Reynolds, P.F. 1983.  Active logical processes and
distributed simulation: An analysis. In {\it Proc. of the 1983 Winter
Simulation Conference\/} (December 1983). pp. 263--264.}

\createrefcode{reynoldsjr87}
{Reynolds 1987}
{Reynolds, P.F. and C.S. Kuhn 1987. A performance study of three
protocols for distributed simulation.  In {\it Proc. of the Eastern
Simulation Conference\/} (April 1987).  pp. 26--31.}

%\createrefcode{shannon75}
%{Shannon 1975}
%{Shannon, R.E. (1975),{\it Systems Simulation: The Art
%and Science\/}, Prentice-Hall, Inc., Englewood Cliffs, N.J.}

\createrefcode{sheppard82}
{Sheppard et al. 1982}
{Sheppard, S., D.T. Philips, and
R.E. Young 1982.
 The design
and implementation of a microprocessor-based distributed digital
simulation system. {NSF Proposal ECS--8215550\/}, 1982, College
Station, Texas.}

\createrefcode{sheppards84}
{Sheppard et al. 1984}
{Sheppard, S., P. Friel, and
D. Reese 1984.
Simulation in Ada: An implementation of two world views. In
{\it Proceedings of the Conference on Simulation in Strongly
Typed Languages\/} (February 1984). 13, 2, pp.
3--9.}

\createrefcode{sheppardc85}
{Sheppard 1985}
{Sheppard, S. 1985. The Design and implementation
of a distributed simulation system: a status report. Texas A \& M
University Computer Science Technical Report 85--001, College Station,
Texas.}

 \createrefcode{shepparda85}
{Sheppard et al. 1985a}
{Sheppard, S., U. Chandra, and
K. Murray 1985.
 Distributed simulation using Ada. In {\it Proceedings of the
Conference on Distributed Simulation 1985,
The 1985 SCS Multiconference\/} (San Diego, California, January 1985).
15, 2, pp. 27--31.}

\createrefcode{sheppardb85}
{Sheppard et al. 1985b}
{Sheppard, S., R.E. Young, U. Chandra,
and M. Krishnamurthi 1985.
Wyatt, D.  Three mechanisms for
distributing simulation. In {\it Proc.~of the 12th Conference of the NSF
Production Research and Technology Program\/} (Madisonville, Wisconsin,
May 14--17, 1985). pp. 67--70.}

\createrefcode{sheppardds88}
{Sheppard et al. 1988}
{Sheppard, S.V., C.K. Davis and U. Chandra 1988.  Parallel
simulation environments for multiprocessor architectures.
In {\it Proceedings of the SCS
Multiconference on Distributed Simulation\/} (San Diego,
California, February 1988). pp. 109--114.
}

%\createrefcode{takenouchi80}
%{Takenouchi et al. 1980}
%{Takenouchi, H., M. Hatada, K. Hiyama,
%H. Inamori, ,
%and M. Toda, 1980.  Parallel processing
%simulator for network systems
%using multi-microcomputers. In {\it Proceedings of the COMPCON Fall
%1980\/}, pp. 55--62.}

\createrefcode{wyatt83}
{Wyatt et al. 1983}
{Wyatt, D.L., S. Sheppard, and
R.E. Young 1983.  An
experiment
in microprocessor-based distributed digital simulation. In {\it Proc.~of
the 1983 Winter Simulation Conference\/} (December 1983). pp. 271--277.}

\createrefcode{wyattd84}
{Wyatt and Sheppard 1984}
{Wyatt, D.L., and
S. Sheppard 1984.  A Language directed
distributed discrete simulation system. In {\it Proc. of the 1984 Winter
Simulation Conference\/} (Dallas, Texas, November 1984). pp. 463--464.}

\createrefcode{wyattdl85}
{Wyatt 1985}
{Wyatt, D.L. 1985.  Simulation programming
on a distributed
system : A preprocessor approach. In {\it
Proceedings of the Conference on Distributed Simulation 1985,
The 1985 SCS Multiconference\/} (San Diego,
California, January 1985). 15, 2, pp. 32--36.}

\createrefcode{wyattdla86}
{Wyatt 1986}
{Wyatt, D.L. 1986.  An experiment in the
design of a microprocessor-based distributed discrete simulation
language.
{Ph.D. Dissertation\/}, Texas A \& M University, College Station, Texas,
August 1986.}

\createrefcode{zeigler84}
{Zeigler 1984}
{Zeigler, B.P. 1984. System-theoretic
representation of simulation models. {\it IIE Transactions 16\/}, 1
(March), 19--34.}

\createrefcode{zeiglerb85}
{Zeigler 1985}
{Zeigler, B.P. 1985. Discrete event formalism for
model based distributed simulation. In {\it
Proceedings of the Conference on Distributed Simulation 1985,
The 1985 SCS Multiconference\/} (San Diego, California, January
1985). 15, 2, pp. 3--7.}
\vfill\eject


\bye

bjornl@tarpon.tds.kth.se (Bj|rn Lisper) (04/18/89)

In this context I want to mention my colleague Rassul Ayani's PhD thesis
"Parallel Simulation on Shared Memory Multiprocessors", TRITA-TCS-8903,
Dept. of Telecommunication and Computer Systems, The Royal Institute of
Technology, May 1989. It's just printed and will be defended in the
beginning of May. Rassul has developed a method to determine moving time
windows, where for each simulated object the events in its time window
surely can be executed in parallel with the events in the other objects time
windows.

Send requests for the thesis to:

Ms. Eivor Atterby
Dept. of Telecommunication and Computer Systems
The Royal Institute of Technology
S-100 44 Stockholm
SWEDEN

I think we give away single copies for free, but I'm not 100 % sure.

Bjorn Lisper

matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (04/27/89)

In article <5117@hubcap.clemson.edu> usha%FSU.BITNET@CUNYVM.CUNY.EDU writes:

>The following is a list of references on distributed simulation.  But
>this has not been updated for the past two years.

I'm glad you brought up the subject, because I would like to raise a
question.

Here is the setting.  It's discrete-event simulation, which many or most 
of your references are concerned with, on multiple-processor systems
(tightly or loosely coupled).  Let P be the number of processors and T
be the amount of run time needed to get the desired statistical accuracy.

The usual approach in these papers is to have different processors handle
different parts of this program.  E.g. one processor might manage the
event list, another might do the statistics collection, etc.  Or, in
process-oriented simulators like Simula or csim (or, for that matter,
Modula II applied to simulation), each processor might handle a different
process.

It has always seemed to me that these methods are inherently wasteful.
Each processor is going to have a substantial amount of idle time.

Moreover, it seems to me that the best method is also the simplest  --
just have all P processors run the uniprocessor version of the program,
but for T/P amount of run time, instead of T.  [Of course, each processor 
needs to use a different random number seed.]

This method gives 100% utilization for all processors, and except for
start-up transients, should give the same statistical accuracy as the
other methods, in fact BETTER accuracy, since in the other methods
time T is actually less than time T, due to idle periods.

Is anyone aware of any thorough work on this question, i.e. the question
of whether the complex methods are better or worse than the simple one?
I think there was a brief paper by Heidelberger on the effect of 
transients in this context a few years ago in one of the Winter 
Simulation Conferences, but it didn't go very far.

Some of the complex methods give a really elegant theory (e.g. see the
references to Chandy), but it still seems to me that the simple method
will give better performance.

   Norm

wen-king@csvax.caltech.edu (Wen-King Su) (04/28/89)

In article <5276@hubcap.clemson.edu> matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes:
>
<It has always seemed to me that [distributed] methods are inherently wasteful.
>Each processor is going to have a substantial amount of idle time.
<
>Moreover, it seems to me that the best method is also the simplest  --
<just have all P processors run the uniprocessor version of the program...
>
<This method gives 100% utilization for all processors...

Since you are not worried about the elapse time of the simulation runs,
which may is itself the most important consideration in some simulations,
I will assume that throughput is the single most important consideration
here.  Even with this assumption, however, it still depends on how you
compute the cost of hardware.

If the cost is proportional to the number of processors, running one
sequential simulation on each processor clearly is optimal.  Since 100%
of the processors are in use at all time, as the number of processors
doubles, the throughput doubles as well.  Nothing can do better than that.

However, if the cost is proportional to the amount of hardware involved,
distributed simulation can be more efficient.  For the purpose of this
discussion, let us first assume that:

	1) Hardware used == the amount of memory involved.
	2) Simulation requires the same amount of memory
	   no matter how many processors are used.

Now, if we further assume that the simulation shows no speedup, it will
still be as efficient as the sequential simulation.  However, if we assume
that the speedup is linear, every time we double the processors and half
the memory per processor, we also half the elapse time.  We doubled the
throughput using essentially the same amount of hardware.  Although the
truth is somewhere in between, if the distributed simulation shows any
speedup at all, it will be more efficient than the sequential simulation.  

In reality, neither 1) nor 2) are true, but the truth lies not far away
and the distributed simulation has to show some degree of speedup just
to break-even.  But if it does better than break-even, it will be more
efficient than the sequential simulation.

One may argue that in real life, the processor size can't change as we
use more and more processors.  In real life, however, neither can a
simulation that tightly fits a 100 processor multicomputer be run on
just one processor of the multicomputer.

/*------------------------------------------------------------------------*\
| Wen-King Su  wen-king@vlsi.caltech.edu  Caltech Corp of Cosmic Engineers |
\*------------------------------------------------------------------------*/

wagner@june.cs.washington.edu (Bullwinkle J. Moose) (04/28/89)

In article <5276@hubcap.clemson.edu>, matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes:
> In article <5117@hubcap.clemson.edu> usha%FSU.BITNET@CUNYVM.CUNY.EDU writes:
> 
> Here is the setting.  It's discrete-event simulation, which many or most 
> of your references are concerned with, on multiple-processor systems
> (tightly or loosely coupled).  Let P be the number of processors and T
> be the amount of run time needed to get the desired statistical accuracy.
> ...
> Moreover, it seems to me that the best method is also the simplest  --
> just have all P processors run the uniprocessor version of the program,
> but for T/P amount of run time, instead of T.  [Of course, each processor 
> needs to use a different random number seed.]
> 
> Is anyone aware of any thorough work on this question, i.e. the question
> of whether the complex methods are better or worse than the simple one?
> I think there was a brief paper by Heidelberger on the effect of 
> transients in this context a few years ago in one of the Winter 
> Simulation Conferences, but it didn't go very far.

Well, funny you should ask, because I have just finished writing up the
chapter of my Ph.D. dissertation dealing with this question.

In fact, Heidelberger is the only one I know of who has done any analysis
of this problem.  The complete references are:

    @InProceedings{ Heidelberger86,
	     Author= "P. Heidelberger",
	     Title= "{Statistical Analysis of Parallel Simulations}",
	     Booktitle= "Proc. 1986 Winter Simulation Conference",
	     Year = 1986,
	     Pages= "290-295",
	     Organization= SCS,
	     Abstract= "Comparison of independent replications with a single
			long parallel simulation."
	    }

    @TechReport{ Heidelberger87,
	     Author= "P. Heidelberger",
	     Title= "{Discrete Event Simulations and Parallel Processing: Statistical Properties}",
	     Institution= "IBM Thomas J. Watson Research Center",
	     Year = 1987,
	     Number= "RC 12733 (\#57302)",
	     Address= "Yorktown Heights, NY",
	     Month= May,
	     Abstract= "Proves that certain statistical estimators 
			obtained by running separate replications
			in parallel are biased, and that some even converge
			to the wrong value!"
	    }

Heidelberger86 examined the tradeoff between intra-replication
parallelism M and inter-replication parallelism K when the total number
of processors is some constant (hence P = MK).  In essence, the results
of Heidelberger86 are that intra-replication parallelism should be
favored over inter-replication parallelism under the following
circumstances:  for systems with a strong initial transient;  for short
runs;  or if a large number of processors are available.  While I have
very little to add to this in an analytical sense, I've spent quite
some time evaluating the practical impact of his results.

While it's true that running parallel independent replications has a
lot of appeal, there are some mitigating considerations.  The first and
most obvious of these is that beyond a certain point, the marginal
effect on confidence interval width of increasing the number of
replications becomes insignificant.  Thus, given enough processors, it
still makes sense to parallelize each replication.

Second, because of the presence of an initialization bias in any
stochastic simulation, it is important to run each replication long
enough to ensure that the confidence intervals that one obtains for
some estimator actually contain the true value of what is being
estimated.  (I.e., an extremely large number of very short replications
will probably result in very small confidence intervals that do not
contain the true value!) What this means in practical terms is that if
elapsed wall clock time is the overriding consideration, you may not be
able to achieve accurate results unless you parallelize each replication.

Third, Heidelberger87 pointed out that as the number of replications
run in parallel increases, statistical considerations require stopping
rules that lead to poor processor utilization near the end of the
simulation run.  Grossly oversimplifying, it turns out that if the
completion time of individual replications is a random variable, then
in order to obtain unbiased estimators of model outputs it is necessary
to wait for all replications to finish.  (The randomness of completion
times may be a significant factor when the simulation is estimating
transient behavior of a model, or when the regenerative method for
obtaining confidence intervals is being used within each replication.)
In fact, under certain assumptions it turns out that as the number of
processors P increases, the speedup obtained tends toward P/logP rather
than P.  (This is similar to a result by Kruskal and Weiss regarding
the completion time of a group of self-scheduling processors taking
tasks off of a central task queue.)  Therefore, your assertion that

> This method gives 100% utilization for all processors, ...

is statistically untrue.

Lastly, I have taken the analysis of Heidelberger86 and tweaked it just
a bit for purposes of illustration.  Rather than fixing a running time
and comparing the mean square error (mse)
of the alternatives obtained by assigning various values to M and K, I
have fixed the mse and time to completion of the sequential case (M=1,
K=P) and calculated what speedup the parallel simulation would need to
achieve (as a function of M) in order to better the accuracy of the
purely sequential method.  I call this the break-even speedup.  It can
be interpreted in one of two ways:  if the parallel simulation is
capable of exceeding the break-even speedup, then either (a) the
parallel strategy can produce the same level of statistical confidence
as the purely sequential strategy in less elapsed time, or (b) the
parallel strategy can produce a higher level of statistical confidence
than the purely sequential strategy in the same amount of elapsed
time.  The advantage of this analysis is that it does not require any
assumptions about the speedup behavior of the parallel simulation.

The general behavior of the break-even speedup is that it is almost
linear in M, but the slope of the line decreases rapidly as the total
number of processors P increases.  For really large P (e.g., 1024) the
break-even speedup is truly paltry even for large M, so that any
reasonable implementation of a parallel simulation ought to be able to
produce a win.  (Note:  I'm not suggesting that it's realistic to
expect a parallel simulator to show speedups on 1024 processors (in
fact, that wouldn't give you anything on which to base confidence
intervals); rather, given 1024 processors, one ought to run a M=32,
K=32 parallel simulation, or, if your simulation starts to wheeze and
die at 32 processors, maybe only M=16, K=64.)

Finally, a pragmatic consideration against running multiple independent
replications in parallel is that some simulations have enormous memory
requirements, thus, it actually may not be possible to run the
replications simultaneously.  In fact, Jefferson et al. at JPL
are running into situations in which they can't even calculate speedup
because they can't run 1-processor versions of the problems they are
simulating (they are running on a distributed memory architecture).

		Dave Wagner
		University of Washington Comp Sci Department
		wagner@cs.washington.edu
		{cornell,harvard,ssc-vax,tektronix}!uw-beaver!wagner

P.S.  My complete dissertation will be available from the University of
Washington sometime this summer.  The title is "Conservative Parallel
Discrete-Event Simulation: Principles and Practice".

eugene@eos.arc.nasa.gov (Eugene Miya) (04/28/89)

The IBM TR mentioned in Dave Wagner's paper was published in the 1987
ACM/SIGMETRICS (Banff) proceedings (shortened slightly).  It was one of the
better papers at that conference.

Longish signature follows "Type 'n' now"

Another gross generalization from

--eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov
  resident cynic at the Rock of Ages Home for Retired Hackers:
  "You trust the `reply' command with all those different mailers out there?"
  "If my mail does not reach you, please accept my apology."
  {ncar,decwrl,hplabs,uunet}!ames!eugene
  				Live free or die.

matloff@iris.ucdavis.edu (Norm Matloff) (04/28/89)

In article <5311@hubcap.clemson.edu> wen-king@csvax.caltech.edu (Wen-King Su) writes:
>In article <5276@hubcap.clemson.edu> matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes:
>>
><It has always seemed to me that [distributed] methods are inherently wasteful.
>>Each processor is going to have a substantial amount of idle time.
><
>>Moreover, it seems to me that the best method is also the simplest  --
><just have all P processors run the uniprocessor version of the program...

><This method gives 100% utilization for all processors...

>Since you are not worried about the elapse time of the simulation runs,
>which may is itself the most important consideration in some simulations,

I'm surprised to see you say this.  Am I missing something, maybe a
subtle distinction between elapsed time and some other measure of time?

I *am* worried about the time.  In fact, I even specified a time T
for a uniprocessor version, which it is hoped to reduce to T/P in a
P-processor system.

What has to be held constant in any comparison is accuracy.  I defined
the accuracy level to be whatever comes from running the simulation on
a uniprocessor for time T.  The problem with both kinds of distributed
simulations  --  the task-partitioned and replicated types  --  is that
in the former, there is less than 100% utilization, while in the latter,
there is initialization bias.  So in both cases, the desired accuracy 
needs more than T/P time, i.e. there is less than a factor-P speedup.
The only question is which is better.

>distributed simulation can be more efficient.  For the purpose of this
>discussion, let us first assume that:

>	1) Hardware used == the amount of memory involved.
>	2) Simulation requires the same amount of memory
>	   no matter how many processors are used.

As far as I know, memory is NOT a problem.  The major data structure is 
the event queue, and in typical applications this is not very long at all, 
maybe a dozen items.

  Norm

matloff@iris.ucdavis.edu (Norm Matloff) (04/28/89)

In article <5312@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes:

>While it's true that running parallel independent replications has a
>lot of appeal, there are some mitigating considerations.  The first and
>most obvious of these is that beyond a certain point, the marginal
>effect on confidence interval width of increasing the number of
>replications becomes insignificant.

This can not be literally true. The accuracy will be roughly inversely 
proportional to the square root of the number of replications, no matter
how many replications there are.  If you quadruple the number of
replications, you should always get double the accuracy (except for
init. bias).

Perhaps you meant that after a certain point, any further decrease in
CI width are of marginal value to the USER?

>Second, because of the presence of an initialization bias in any
>stochastic simulation, it is important to run each replication long
>enough to ensure that the confidence intervals that one obtains for
>some estimator actually contain the true value of what is being
>estimated.

Yes, as I recall, that was the theme of Heidelberger's paper.  But
it seems to me that it's not so simple.

For example, in typical applications, a number of runs will be made
of the same simulation program, with different parameters, e.g.
different arrival rates.  After doing the first run, the user has
a pretty good idea of the steady-state queue lengths, and can
initialize the other runs with nonempty queues of those approximate
sizes.  This should reduce the initialization bias quite substantially.

Or, better yet, in systems which have regeneration points and don't
have huge (in the notation from my posting yesterday, >> T/P)
regeneration cycle times, one can simply start each replicate at
the regeneration point  --  thus no initialization bias at all.

Of course, you do run into the problem you mentioned, i.e. the
problem of "waiting for the slowest one," which gave you your
P/log P figure.  However, analogous kinds of problem should plague
the task-partitioning approach too, e.g. as P increases, the
problem of idle time should get worse and worse.

The task-partitioning approach also has a scaling problem.  For
a fixed application, say a 10-server queue, there will be a very
small limit on the usable value of P.  After a certain point,
you just can't find anything for the processors to do.  On the
other hand, the replicates approach has this too, i.e. for a
fixed problem, a large value of P exacerbates either the
initialization bias problem or the "wait for the slowest one
problem."

So I don't think that either approach is a clear win.  But the
replicates approach certainly is much simpler.

>Finally, a pragmatic consideration against running multiple independent
>replications in parallel is that some simulations have enormous memory
>requirements,

In most simulations of queueing systems in practice, memory should not
be a problem.  The length of the event list should be on the order of 
magnitude of the number of servers, which is generally not large.

   Norm

chandra%loligo@gatech.edu (Usha Chandra) (05/01/89)

>I'm glad you brought up the subject, because I would like to raise a
>question.
>
>Here is the setting.  It's discrete-event simulation, which many or most 
>of your references are concerned with, on multiple-processor systems
>(tightly or loosely coupled).  Let P be the number of processors and T
>be the amount of run time needed to get the desired statistical accuracy.
>
>The usual approach in these papers is to have different processors handle
>different parts of this program.  E.g. one processor might manage the
>event list, another might do the statistics collection, etc.  Or, in
>process-oriented simulators like Simula or csim (or, for that matter,
>Modula II applied to simulation), each processor might handle a different
>process.
>
>It has always seemed to me that these methods are inherently wasteful.
>Each processor is going to have a substantial amount of idle time.

Thanks to the advances in parallel hardware, this method has exhibited
good performance.  Please refer the recent publications by Chandy and 
Sherman, Jefferson et al, Reed and Malony and Fujimoto.  

Some of the papers that you may want to lood at are:

1. K.M. Chandy, R. Sherman, The conditional event approach to distibuted 
   simulation, .....(I do not remember where it was published)
   But you may be able to contact R. Sherman at the following address:
   USC/Information Sciences Institute
   4676 Admiralty Way Suite 1001
   Marina Del Rey, CA 90292
   
2. D.A. Reed, A.D. Malony, Parallel discrete event simulation:......,
   Proc. of the conf on Distributed Simulation, 19,3, 1987.

3. R.M. Fujimoto, Performance measurements....., Proc. of the conf.
   of distributed simulation, 19, 3, 1987.

You may be able to find the most recent papers in proceedings of the
Winter Simulation Conference, Eastern Simulation Conference, 
SCS Multiconference ???, and Summer Computer Simulation Conference.

>Moreover, it seems to me that the best method is also the simplest  --
>just have all P processors run the uniprocessor version of the program,
>but for T/P amount of run time, instead of T.  [Of course, each processor 
>needs to use a different random number seed.]
>

Some work has been done by John Comfort.  He distributed the support 
functions such as the random number generation, queue handling, 
statistics collection, event processing etc on multiple processors and 
got some performance out of it but not quite enough.

It is not that simple.  A uniprocessor version will use some form of 
event list (a sequential list) and assigning the first n events to n
processors may indeed lead to incorrect simulation.  For example, 
consider an event-list with time stamps, t1,t2,t3 ... t1 <= t2 <= t3.  
Processing the event with time stamp t1 may cause an event to be 
inserted between t1 and t2.

Usha Chandra

matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/02/89)

In article <5344@hubcap.clemson.edu> chandra%loligo@gatech.edu (Usha Chandra) writes:

Speaking of task-oriented simulators which put tasks, e.g. simulations of
different servers, on different machines, I had said:

>>It has always seemed to me that these methods are inherently wasteful.
>>Each processor is going to have a substantial amount of idle time.

>Thanks to the advances in parallel hardware, this method has exhibited
>good performance.  Please refer the recent publications by Chandy and 
>Sherman, Jefferson et al, Reed and Malony and Fujimoto.  

I've seen a number of these.  Again, there are inherent problems,
e.g. scaling.  Say the application to be simulated has 20 servers.
There is really no way that the task-oriented approach can be fruitful 
with much more than 20 processors.

>>Moreover, it seems to me that the best method is also the simplest  --
>>just have all P processors run the uniprocessor version of the program,
>>but for T/P amount of run time, instead of T.  [Of course, each processor 
>>needs to use a different random number seed.]

>Some work has been done by John Comfort.  He distributed the support 
>functions such as the random number generation, queue handling, 
>statistics collection, event processing etc on multiple processors and 
>got some performance out of it but not quite enough.

Yes, I also referred to this method in my original posting.  I think it
has even less potential than the task-oriented method.

>It is not that simple.  A uniprocessor version will use some form of 
>event list (a sequential list) and assigning the first n events to n
>processors may indeed lead to incorrect simulation.  For example, 
>consider an event-list with time stamps, t1,t2,t3 ... t1 <= t2 <= t3.  
>Processing the event with time stamp t1 may cause an event to be 
>inserted between t1 and t2.

Yes, I think that the recent work on methods for correct synchronization,
etc., by Chandy etc., are very elegant and impressive.  But my point is
still that for a stochastic system, these should have essentially the
same performance as the replication method, and they are a lot more 
difficult.

   Norm

wagner@june.cs.washington.edu (Bullwinkle J. Moose) (05/02/89)

In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes:
> In article <5312@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) (Hey!  that's me!) writes:
> >Because of the presence of an initialization bias in any
> >stochastic simulation, it is important to run each replication long
> >enough to ensure that the confidence intervals that one obtains for
> >some estimator actually contain the true value of what is being
> >estimated.
> 
> Yes, as I recall, that was the theme of Heidelberger's paper.  But
> it seems to me that it's not so simple.
> 
> For example, in typical applications, a number of runs will be made
> of the same simulation program, with different parameters, e.g.
> different arrival rates.  After doing the first run, the user has
> a pretty good idea of the steady-state queue lengths, and can
> initialize the other runs with nonempty queues of those approximate
> sizes.  This should reduce the initialization bias quite substantially.

Although your argument seems quite reasonable, it is, alas, untrue.
First of all, note that the steady-state of a stochastic process is
represented not by a single value, but by a *distribution* of that value.
Therefore, what you really need to do is estimate the distribution of the
steady-state, and then start up a number of runs, with the intitial
conditions chosen at random from that distribution.  (Note that if you
knew the distribution of steady-state conditions exactly, then there would
be no point to doing the simulation at all!)

As an example of what happens if steady-state is chosen to be
the average value instead of a set of samples from the (estimated)
distribution, I refer you to "Statistical Analysis of Simulation Output
Data", by A.M. Law, Oper. Res. 31(6):983-1029 (June 1983).  In the
following excerpt from pages 1015-1017, my own comments are enclosed in [...]:

    "The suggested procedure [using pilot runs to estimate steady-state]
     would be easy to apply in the case of an M/M/1 queue, since the
     state of the system is a single integer-valued r.v., namely, the
     number of people in the system.  In the case of a complex real-world
     simulation, however, this procedure may require estimating the joint
     distribution of a large number of r.v.'s, some of which may be continuous.
     As a simple example of this difficulty, consider a single-server
     queueing system without exponential interarrival times or service times.
     In this case, we would have to estimate the joint distribution of number
     in system, time to the next arrival, and the remaining service time of
     the customer in service (if any).

     [ In the following paragraph, Di is defined to be the delay experienced
     by the ith customer to arrive at the server, and d is defined to be the
     steady-state expected average delay, i.e. lim(n-->oo) (D1+D2+...+Dn)/n]

    "Kelton and Law ["The Transient Behavior of the M/M/s Queue with
     Implications for Steady-State Simulation", TR 82-7, Dept. of MIS,
     U. of Arizona, 1982] computed E(Di) as a function of i and the number
     in system at time 0, say, l [lower-case l, not 1] for various M/M/s
     queues....
     Let i(l,0.01) [be] the smallest point [value of i]
     beyond which all values of E(Di) fall withing 1 percent of d,
     viewed as a function of the initial number in the system, l.
     Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01)
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
     is minimized for l = 15.  This result is interesting since
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
     the steady-state time-average number of customers in system is 9.
     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]

In general, Law's article is an excellent guide to the pitfalls (his word)
of using classical statistical reasoning in the analysis of simulation
output data.  Not to mention intuition!

> Or, better yet, in systems which have regeneration points and don't
> have huge (in the notation from my posting yesterday, >> T/P)
> regeneration cycle times, one can simply start each replicate at
> the regeneration point  --  thus no initialization bias at all.

True, the method of regeneration theoretically does not suffer from the
initialization bias problem.  But "theoretically" is the key word here,
because unless what you are simulating is truly a regenerative process,
your regeneration points are only approximate.  And using regeneration
would definitely introduce the problem of stragglers (see my last
posting), because it might lead to a wide variation in completion time
of the replications unless the number of regeneration cycles within
each replication is very large.

> Of course, you do run into the problem you mentioned, i.e. the
> problem of "waiting for the slowest one," which gave you your
> P/log P figure.  However, analogous kinds of problem should plague
> the task-partitioning approach too, e.g. as P increases, the
> problem of idle time should get worse and worse.

But, since the number of replications being run when intra-replication
parallelism is used in less than P, the magnitude of the effect is smaller.
I didn't claim that the "tail effect" problem was a major one, only that
it "nudges the scale" a bit in the direction of intra-replication parallelism.

> The task-partitioning approach also has a scaling problem.  For
> a fixed application, say a 10-server queue, there will be a very
> small limit on the usable value of P.  After a certain point,
> you just can't find anything for the processors to do.

Definitely true, which is why the best approach for a large number of
processors is probably a combination of inter- and intra- replication
parallelism.

> 
> So I don't think that either approach is a clear win.  But the
> replicates approach certainly is much simpler.
> 
> >Finally, a pragmatic consideration against running multiple independent
> >replications in parallel is that some simulations have enormous memory
> >requirements,
> 

> In most simulations of queueing systems in practice, memory should not
> be a problem.  The length of the event list should be on the order of 
> magnitude of the number of servers, which is generally not large.

Depends on what is being simulated.  I have friends at Bellcore who were
simulating a metropolitan trunk network in which the steady-state size of
the event list was about 15 THOUSAND events.

"All generalizations are false, including this one."

		Dave Wagner
		University of Washington Comp Sci Department
		wagner@cs.washington.edu
		{cornell,harvard,ssc-vax,tektronix}!uw-beaver!wagner

matloff%crow.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/03/89)

In article <5367@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes:
>In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes:

>Although your argument seems quite reasonable, it is, alas, untrue.
>First of all, note that the steady-state of a stochastic process is
>represented not by a single value, but by a *distribution* of that value.

Actually, Law's statement that you quoted is self-contradictory, it seems
to me.  He is apparently willing to use the mean in one-dimensional
problems but not multidimensional problems, which doesn't make sense.

>Therefore, what you really need to do is estimate the distribution of the
>steady-state, and then start up a number of runs, with the intitial
>conditions chosen at random from that distribution.

True, this would be optimal.  But does it make that much difference?
Unfortunately, I don't have time to go read the paper right now, but
I would guess that it really doesn't make that much difference, and
that using the mean is nearly optimal in terms of *result*.  In other
words, even though 9 and 15 are far apart in the excerpt quoted below,
I would guess that E(D9) and E(D15) are NOT far apart.

>     Let i(l,0.01) [be] the smallest point [value of i]
>     beyond which all values of E(Di) fall withing 1 percent of d,
>     viewed as a function of the initial number in the system, l.
>     Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01)
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
>     is minimized for l = 15.  This result is interesting since
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
>     the steady-state time-average number of customers in system is 9.
>     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]

>> Or, better yet, in systems which have regeneration points and don't
>> have huge (in the notation from my posting yesterday, >> T/P)
>> regeneration cycle times, one can simply start each replicate at
>> the regeneration point  --  thus no initialization bias at all.

>True, the method of regeneration theoretically does not suffer from the
>initialization bias problem.  But "theoretically" is the key word here,
>because unless what you are simulating is truly a regenerative process,
>your regeneration points are only approximate.  

Again, I think approximate is good enough in many cases.  Also, I would
mention that I've seen a number of specific applications in which it was
claimed that there was no regeneration point, but that if things like
symmetries are taken into account, it turns out that there are good
regeneration points which one can use.

>> Of course, you do run into the problem you mentioned, i.e. the
>> problem of "waiting for the slowest one," which gave you your
>> P/log P figure.  However, analogous kinds of problem should plague
>> the task-partitioning approach too, e.g. as P increases, the
>> problem of idle time should get worse and worse.

>But, since the number of replications being run when intra-replication
>parallelism is used in less than P, the magnitude of the effect is smaller.

No, what I am saying is that even with no replication at all, there will
be **analogous** straggler effects, due to interprocess dependencies.

>> The task-partitioning approach also has a scaling problem.  For
>> a fixed application, say a 10-server queue, there will be a very
>> small limit on the usable value of P.  After a certain point,
>> you just can't find anything for the processors to do.

>Definitely true, which is why the best approach for a large number of
>processors is probably a combination of inter- and intra- replication
>parallelism.

That could well be true.

>> In most simulations of queueing systems in practice, memory should not
>> be a problem.  The length of the event list should be on the order of 
>> magnitude of the number of servers, which is generally not large.

>Depends on what is being simulated.  I have friends at Bellcore who were
>simulating a metropolitan trunk network in which the steady-state size of
>the event list was about 15 THOUSAND events.

Yes.  John Comfort once mentioned in a conference that he had seen one
case in which there were typically 100,000 events in the list.  But that's
why I said "most."

   Norm

matloff%crow.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/03/89)

In article <5367@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes:
>In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes:

[I replied to Dave's article this morning, but apparently it never got
to the net, so I'm replying again.  The version here has been updated.]

*First of all, note that the steady-state of a stochastic process is
*represented not by a single value, but by a *distribution* of that value.

True, but if our aim is to reduce initialization bias, a single value
should work reasonably well.  See below.

*     [ In the following paragraph, Di is defined to be the delay experienced
*     by the ith customer to arrive at the server, and d is defined to be the
*     steady-state expected average delay, i.e. lim(n-->oo) (D1+D2+...+Dn)/n]

*    "Kelton and Law ["The Transient Behavior of the M/M/s Queue with
*     Implications for Steady-State Simulation", TR 82-7, Dept. of MIS,
*     U. of Arizona, 1982] computed E(Di) as a function of i and the number
*     in system at time 0, say, l [lower-case l, not 1] for various M/M/s
*     queues....
*     Let i(l,0.01) [be] the smallest point [value of i]
*     beyond which all values of E(Di) fall withing 1 percent of d,
*     viewed as a function of the initial number in the system, l.
*     Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01)
*     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
*     is minimized for l = 15.  This result is interesting since
*     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]
*     the steady-state time-average number of customers in system is 9.
*     [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^]

Yes, 9 is fairly different from 15.  However, it doesn't make much
difference in initialization bias.  In Law's graph, the l = 10 and 
l = 15 lines (there is none for l = 9) give about the same sized
initialization bias.  In other words, just a "ballpark" value is
probably good enough to reduce the bias to acceptable scale.

>> Or, better yet, in systems which have regeneration points and don't
>> have huge (in the notation from my posting yesterday, >> T/P)
>> regeneration cycle times, one can simply start each replicate at
>> the regeneration point  --  thus no initialization bias at all.

*True, the method of regeneration theoretically does not suffer from the
*initialization bias problem.  But "theoretically" is the key word here,
*because unless what you are simulating is truly a regenerative process,
*your regeneration points are only approximate.  

Again, I think an approximate regeneration point is good enough for
practical purposes.

>> Of course, you do run into the problem you mentioned, i.e. the
>> problem of "waiting for the slowest one," which gave you your
>> P/log P figure.  However, analogous kinds of problem should plague
>> the task-partitioning approach too, e.g. as P increases, the
>> problem of idle time should get worse and worse.

*But, since the number of replications being run when intra-replication
*parallelism is used in less than P, the magnitude of the effect is smaller.

I think we're talking about different things here.  What I am talking
about occurs even if there is no replication.  What I am saying is that
due to interprocess dependencies, there will be process waiting involved,
and that this should in many applications have a log P growth rate, just
as in the replication case, even though for different reasons.

>> The task-partitioning approach also has a scaling problem.  For
>> a fixed application, say a 10-server queue, there will be a very
>> small limit on the usable value of P.  After a certain point,
>> you just can't find anything for the processors to do.

*Definitely true, which is why the best approach for a large number of
*processors is probably a combination of inter- and intra- replication
*parallelism.

This may well turn out to be the case.  But I still feel that this has
not been looked at sufficiently.  Your work sounds excellent, but you
yourself indicated that it's the only work to be done since Heidleberger's,
which itself was just a preliminary investigation.

>> In most simulations of queueing systems in practice, memory should not
>> be a problem.  The length of the event list should be on the order of 
>> magnitude of the number of servers, which is generally not large.

*Depends on what is being simulated.  I have friends at Bellcore who were
*simulating a metropolitan trunk network in which the steady-state size of
*the event list was about 15 THOUSAND events.

I heard John Comfort give a talk once in which he mentioned hearing of
some application in which the typical size of the event list was
100,000 events.  That's why I qualified my statement above by using
the word "most." :-)

   Norm

fishwick@fish.cis.ufl.edu (Paul Fishwick) (05/03/89)

[
  Prof. Fishwick is the moderator of the comp.simulation group.  For
  those who receive comp.parallel by direct mail, you can submit to
  him directly.
	--steve
]


This is a small plea on behalf of our readers in "comp.simulation". Will
individuals posting information concerning distributed simulation please
post articles ALSO to comp.simulation ? I think you will find that this
will make the discussion much more lively. Thanks for your consideration.

-paul

+------------------------------------------------------------------------+
| Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu       |
| Dept. of Computer Science. UUCP: gatech!uflorida!fishwick              |
| Univ. of Florida.......... PHONE: (904)-335-8036                       |
| Bldg. CSE, Room 301....... FAX is available                            |
| Gainesville, FL 32611.....                                             |
+------------------------------------------------------------------------+