usha%FSU.BITNET@CUNYVM.CUNY.EDU (04/15/89)
Hi The following is a list of references on distributed simulation. But this has not been updated for the past two years. David Jefferson has a paper on 'Virtual Time'. The following list may not be very readable; sorry about that. It came out of a survey paper that I wrote on distributed simulation but never got to publish. I used TeX to generate the paper and hence it has the macro calls such as \createrefcode, \it etc. Hope you find it useful. Usha Chandra -------------------------------------------------------------------------- References: ----------- \createrefcode{adaref}{Ada 1983} {{\it Ada Programming Language\/} 1983. Military Standard, ANSI/MIL-STD-1815A.} %\createrefcode{adels82}{Adelsberger 1982} %{Adelsberger, H.H. 1982. ASSE-Ada simulation support %environment. In {\it Proc.~of\ the\ 1982\ Winter\ Simulation\ %Conference\/}, (San Diego, California, 1982), pp. 89--101.} % %\createrefcode{adelsa83} %{Adelsberger 1983a}{Adelsberger, H.H. 1983. % A structured and modular %approach to %transaction flow models. In {\it Proc.\ of the 1983 Summer Computer %Simulation Conference\/}, (Vancouver, B.C., Canada, 1983).} % %\createrefcode{adelsb83} %{Adelsberger 1983b} %{Adelsberger, H.H. 1983. Modelling and simulation %in Ada. %In {\it Proc. of the First\ European\ Simulation\ Congress\ ESC8\/}, %(Berlin, 1983). pp. 273-280.} % %\createrefcode{adelsc83}{Adelsberger 1983c} %{Adelsberger, H.H. 1983. Interactive %modeling and simulation %of transaction flow or network models using the Ada simulation %support environment. In {\it Proc.\ of\ the\ 1983\ Winter\ Simulation %\ Conference\/}, (Washington, 1983), pp. 561--570.} % %\createrefcode{bagrodia83} %{Bagrodia 1983}{Bagrodia, R. 1983. MAY : a process %based simulation %language. Master Report, Dept.~of Computer Sciences, University %of Texas at Austin, Austin, Texas.} % %\createrefcode{bagrodiaa84} %{Bagrodia et al. 1984a}{Bagrodia, R., K.M. Chandy, and J. Misra %1984. % A message based %approach to discrete event simulation. {\it Report TR LCS--8403\/}, %Dept.~of Computer Sciences, University of Texas at Austin, May.} % \createrefcode{bagrodiab84} {Bagrodia et al. 1984b} {Bagrodia, R., K.M. Chandy, and J. Misra, 1984. MAY : a message based simulation language for small computers. In {\it Proc.~of the Small Computers (R)Evolution\/} (September). pp. 148--153.} \createrefcode{baik85} {Baik and Zeigler 1985} {Baik, D.K. and B.P. Zeigler 1985. Performance evaluation of hierarchical distributed simulations. In {\it Proceedings of the 1985 Winter Simulation Conference\/} (San Francisco, California, December 11--13, 1985). pp. 421--427.} %\createrefcode{banks84} %{Banks and Carson 1984} %{Banks, J. and %J.S. Carson II 1984. {\it Discrete-Event %System Simulation\/}, Prentice-Hall, Inc., Englewood Cliffs, New Jersey.} \createrefcode{berry85} {Berry and Jefferson 1985} {Berry, O. and D.R. Jefferson 1985. Critical path analysis of distributed simulation. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 57--60.} %\createrefcode{braun83} %{Braun 1983} %{Braun, J., Ed. 1983. {\it SIMSCRIPT II.5 Handbook\/}, %C.A.C.I.} % %\createrefcode{brunoa84} %{Bruno 1984a} %{Bruno, G. 1983. Rationale for the introduction of %discrete simulation primitives in Ada. In {\it Proc.~of the Conference %on Simulation in Strongly Typed Languages\/} (February 2--4, 1984). 13, %2, Simulation Series, pp. 10--15.} % %\createrefcode{brunob84} %{Bruno 1984b} %{Bruno, G. 1984. Using Ada for discrete event %simulation. %{\it Software-Practice and Experience 14\/}, 7(July), 685--695.} \createrefcode{bryant77} {Bryant 1977} {Bryant, R.E. 1977. Simulation of packet communication architecture computer \break systems. {M.S. Thesis, MIT/LCS/TR-188\/}, Massachusetts Institute of Technology, Cambridge, Massachusetts, November 1977.} \createrefcode{bryantrm81} {Bryant 1981} {Bryant, R.M. 1981. {\it SIMPAS User's Manual\/}. Dept. of Computer Science and Academic Computing Center, University of Wisconsin-Madison, Madison, W.I.} %\createrefcode{carver79}{Carver 1979} %{Carver, M.B. (1979) The FORSIM VI distributed system %simulation package. In {\it Methodology in System %Modelling and Simulation, Zeigler et al. Eds\/}., % North Holland Publishing Company, %Amsterdam.} \createrefcode{chandra86} {Chandra 1986} {Chandra, U. 1986. The design of a distributed concurrent simulation environment. {\it Dissertation\/}, Department of Computer Science, Texas A \& M University, August 1986.} \createrefcode{chandras85} {Chandra and Sheppard 1986a} {Chandra, U. and S. Sheppard 1986. Implementation and analysis of random variate generators in Ada. {\it Journal of Pascal, Ada, \& Modula-2 4\/}, 5(July/August).} %\createrefcode{chandrase85} %{Chandra and Sheppard 1986b} %{Chandra, U. and S. Sheppard 1986. An %algorithm for distributed concurrent simulation. {\it Submitted for %publication\/}, Texas A \& M University, College Station, Texas.} \createrefcode{chan79} {Chandy and Misra 1979} {Chandy, K.M. and J. Misra 1979. Distributed simulation: a case study in design and verification of distributed programs. {\it IEEE Transactions on Software Engineering SE-5\/}, 5 (September).} \createrefcode{chand79} {Chandy et al. 1979} {Chandy, K.M., J. Misra, and V. Holmes 1979. Distributed simulation of networks. {\it Computer Networks 3\/}, (January 1979), 105--113.} \createrefcode{chandy81} {Chandy and Misra 1981} {Chandy, K.M. and J. Misra 1981. Asynchronous distributed simulation via a sequence of parallel computations. {\it Communications of the ACM 24\/}, 11(April), 198--206.} \createrefcode{chandyk82} {Chandy and Misra 1982} {Chandy, K.M. and J. Misra 1982. A distributed algorithm for detecting deadlocks in distributed systems. In {\it Proc. of ACM SIGACT-SIGOPS Symposium on Principles of Distributed Computing\/} (Ottawa, Canada, August 1982). pp. 157--164.} \createrefcode{chandykm83} {Chandy and Misra 1983} {Chandy, K.M. and J. Misra 1983. Distributed deadlock detection. {\it ACM Transactions on Computer Systems 1\/}, 2 (May 1983), 144--156.} %\createrefcode{christopher82} %{Christopher et al. 1982} %{Christopher, T., M. Evens, %R.R. Gargeya, and %T. Leonhardt 1982. Structure of a distributed simulation %system. In {\it Proc. of the 3rd International Conference %on Distributed Computing %Systems\/} (Miami/Ft.Lauderdale, Florida, 1982). pp. 584--589.} \createrefcode{comfort79} {Comfort 1979} {Comfort, J.C. 1979. A taxonomy and analysis of event set management algorithms for discrete event simulation. In {\it Record of Proceedings of the 12th Annual Simulation Symposium\/} (Florida, 1979). pp. 115--146.} \createrefcode{comforta81} {Comfort 1981} {Comfort, J.C. 1981. The simulation of a microprocessor based event-set processor. In {\it Proc. of the 14th Annual Simulation Symposium\/} (Tampa, Florida, March 1981). pp.17--33.} \createrefcode{comfortb81} {Comfort and Miller 1981a} {Comfort, J.C. and A. Miller 1981. The simulation of a pipelined event set processor. In {\it Proc.~of the 1981 Winter Simulation Conference\/} (Atlanta, Georgia, December 1981). pp. 591--597.} \createrefcode{comfortc81} {Comfort and Miller 1981b} {Comfort, J.C. and A. Miller 1981. Considerations in the design of a multi microprocessor based simulation computer. In {\it Modeling and Simulation on Microcomputers, L.A. Leventhal Ed\/}. pp. 110--112.} \createrefcode{comfortj82} {Comfort 1982} {Comfort, J.C. 1982. The design of a multi-microprocessor based simulation computer-I. {\it In Record of Proceedings of the 15th Annual Simulation Symposium\/} (Tampa, Florida, March 1982). pp. 45--53.} \createrefcode{comfortjc83} {Comfort 1983} {Comfort, J.C. 1983. The design of a multi-microprocessor based simulation computer-II. In {\it Record of Proceedings of the 16th Annual Simulation Symposium\/} (Tampa, Florida, 1983). pp. 197--209.} \createrefcode{comfortjca84} {Comfort et al. 1984} {Comfort, J.C., Y. Winquing, and Qiang Li 1984. The design of a multi-microprocessor based simulation computer-III. In {\it Record of Proceedings of the 17th Annual Simulation Symposium\/} (Tampa, Florida, March 1984). pp. 227--241.} \createrefcode{comfortjcb84} {Comfort 1984} {Comfort, J.C. 1984. The simulation of a master-slave event-set processor. {\it Simulation 42\/}, 3 (March), 117--124.} \createrefcode{concepcion85} {Concepcion 1985a} {Concepcion, A.I. 1985. Mapping distributed simulators onto the hierarchical multi-bus multiprocessor architecture. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 8--13.} \createrefcode{concepcionb85} {Concepcion 1985b} {Concepcion, A.I. 1985. Implementation of the abstract simulator on the HEP computer. In {\it Proceedings of the 1985 Winter Simulation Conference\/} (San Francisco, California, December 11--13, 1985). pp. 428--434.} \createrefcode{david83} {Davidson and Reynolds 1983} {Davidson, D.L. and P.F. Reynolds 1983. Performance analysis of a distributed simulation algorithm based on active logical processes. In {\it Proc.~of the 1983 Winter Simulation Conference\/} (December 1983). pp. 267--268.} \createrefcode{davidson84} {Davidson 1984} {Davidson, D.L. 1984. A distributed simulation implementation. {\it M.S.~Thesis\/}, The University of Virginia, January 1984.} \createrefcode{davisdiss88} {Davis 1988} {Davis, C. K. 1988. Automatic simulation model development for a multiprocessor architecture, {\it Dissertation\/}, Dept. of Computer Science, Texas A \& M University, August 1988.} \createrefcode{friel84} {Friel and Sheppard 1984} {Friel, P. and S. Sheppard 1984. Implications of the Ada environment for simulation studies. In {\it Proceedings of the 1984 Winter Simulation Conference\/} (Dallas, Texas, December 1984). pp. 270--277. } \createrefcode{fujimotods88} {Fujimoto 1988} {Fujimoto, R.M. 1988. Performance measurements of distributed simulation strategies. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 14--20. } \createrefcode{fujimotodsb88} {Fujimoto et al. 1988} {Fujimoto, R.M., Tsai, Jya-Jang and Gopalakrishnan, G. 1988. The roll back chip: Hardware support for distributed simulation using time warp. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 81--86. } %\createrefcode{georgiadis81}{Georgiadis et al. 1981} %{Georgiadis, P.G., M.P. Papazouglou, %and D.G. Maritsas, % Towards a parallel SIMULA machine. In {\it Proceedings %of the 8th Annual Symposium on %Computer Architecture\/}, (May 1981). pp. 263--278.} %\createrefcode{gpss69}{GPSS 1969} %{IBM %Application Program GH20-0304-4 (1969), %{\it GPSS/360 Introductory User's Manual\/}, November.} %\createrefcode{hooper82} %{Hooper and Reiley 1982} %{Hooper, J.W. and %K.D. Reiley 1982. An algorithmic %analysis of simulation strategies. {\it International Journal of %Computer and Information Sciences 11\/}, 2, 101--122.} %\createrefcode{hooperj83} %{Hooper 1983} %{Hooper, J.W. 1983.The effects of strategy on %simulation language characteristics. In {\it Proceedings of the 1983 %Summer Computer Simulation Conference\/} (Vancouver, B.C., Canada, 1983). %pp. 48--53.} \createrefcode{gafnids88} {Gafni 1988} {Gafni, A. 1988. Performance measurements of distributed simulation strategies. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 61--67. } \createrefcode{hugheswsc87} {Hughes et al. 1987.} {Hughes, C., S.V. Sheppard, and U. Chandra 1987. Two implementations of a concurrent simulation environment. In {\it Proceedings of the 1987 Winter Simulation Conference\/} (Atlanta, Georgia, December 1987). pp. ???--???.} \createrefcode{jefferson82} {Jefferson and Sowizral 1982} {Jefferson, D.R. and H. Sowizral 1982. Fast concurrent simulation using time warp mechanism, Part I: Local control. {\it RAND Note No.~N--1906--AF\/}, RAND Corporation, Santa Monica, California, December 1982.} \createrefcode{jeffersona85} {Jefferson and Sowizral 1985} {Jefferson, D.R. and H. Sowizral 1985. Fast concurrent simulation using the time warp mechanism. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 63--69.} \createrefcode{jeffersonb85} {Jefferson et al. 1985} {Jefferson, D.R. et al. 1985. Implementation of time warp on the Caltech hypercube. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 70--75.} \createrefcode{jeffersonc85} {Jefferson 1985} {Jefferson, D.R. 1985. Virtual time. {\it ACM Transactions on Programming Languages 7\/}, 3 (July), 404--425.} %\createrefcode{jones80} %{Jones and Schwarz 1980} %{Jones, A.K. and %P. Schwarz 1980. Experience using %multiprocessor systems - a status report. {\it Computing Surveys\/}, %12, 2 (June 1980), 121--165.} %\createrefcode{kiviat71} %{Kiviat 1971} %{Kiviat, P.J. (1971), Simulation languages. {\it On %Computer Simulation Experiments with Models of Economic Systems. T.H. %Naylor, Ed\/}, John Wiley \& Sons, New York, 406--489.} %\createrefcode{klahr82} %{Klahr 1982} %{Klahr, P. 1982. Overview %of the ROSS simulation system. %In {\it Proc.~of the 10th IMACS World Congress on System Simulation and %%Scientific Computation\/} (August 1982). pp. 67--69.} % % %\createrefcode{krishnamurthi84} %{Krishnamurthi and Young 1984} %{Krishnamurthi, M. and %R.E. Young 1984. % A multitasking %implementation of system simulation : The emulation of an %asynchronous parallel processor for system simulation using a %single processor. In {\it Proc.~of the 1984 Winter Simulation %Conference\/} (Dallas, Texas, November 1984). pp. 261--271.} \createrefcode{krishnamurthim85} {Krishnamurthi et al. 1985} {Krishnamurthi, M., U. Chandra, and S. Sheppard 1985. Two appproaches to the implementation of a distributed simulation system. In {\it Proceedings of the 1985 Winter Simulation Conference\/} (San Francisco, California, December 11--13, 1985). pp. 435--443.} \createrefcode{lamport78} {Lamport 1978} {Lamport, L. 1978. Time, clocks and the ordering of events in a distributed system. {\it Communications of the ACM 21\/}, 7 (July), 558--565.} \createrefcode{lomowds88} {Lomow et al. 1988} {Lomow, G. et al. 1988. A performance study of the time warp. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 50--55. } %\createrefcode{mcdonald80}{McDonald et al. 1980} %{McDonald, W.C., H.O. Welch, and %J.D. Reynolds, % Distributed simulation host: A distributed test bed. %In {\it Proc.~of %the COMPSAC 80\/}, (October 1980). pp. 562--568.} % % %\createrefcode{misra84} %{Misra 1984} %{Misra, J. 1984. Distributed %simulation. {\it Tutorial presented in the Fourth %International Conference on Distributed Computing Systems\/}, IEEE, %San Francisco, California, May.} % \createrefcode{misraj86} {Misra 1986} {Misra, J. 1986. Distributed Discrete-Event Simulation. {\it Computing Surveys\/}, 18, 1, 39--65.} \createrefcode{nicol84} {Nicol and Reynolds 1984} {Nicol, D.M. and P.F. Reynolds 1984. Problem oriented protocol design. In {\it Proc.~of the 1984 Winter Simulation Conference\/} (Dallas, Texas, November 1984). pp. 471--474.} \createrefcode{nicold85} {Nicol and Reynolds 1985a} {Nicol, D.M. and P.F. Reynolds 1985. A statistical approach to dynamic partitioning. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 53--56.} \createrefcode{nicoldm85} {Nicol and Reynolds 1985b} {Nicol, D.M. and P.F. Reynolds 1985. An operational repartitioning policy. In {\it Proceedings of the 1985 Winter Simulation Conference\/} (San Francisco, California, December 11--13, 1985). pp. 493--497.} \createrefcode{nicolds88} {Nicol 1988} {Nicol, D.M. 1988. Mapping a battlefield simulation onto message passing parallel architectures. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 141--146. } %\createrefcode{ogrady79}{O'Grady 1979} %{O'Grady, E.P. 1979. % Interprocessor communication in %multiprocessor simulation systems. In {\it Proc.~of the FALL COMPCON %1979}, pp. 300--306.} % %\createrefcode{ogradye83}{O'Grady and Wang 1983} %{O'Grady, E.P., and %C.H. Wang, Multi-bus %based parallel %processor for system simulation. In {\it Proc.~of the 1983 Summer %Computer Simulation Conference\/}, (Vancouver, B.C., Canada, July 1983). %pp. 371--375.} \createrefcode{ohalloron83} {O'Halloron 1983} {O'Halloron, D.R. 1983. Analysis of a model for distributed simulation, {M.S. Thesis\/}, The University of Virginia, Charlottesville, Virginia, January. } %\createrefcode{papazaglou81}{Papazaglou et al. 1981} % {Papazaglou, M.P., P.I. Georgiadis, % and D.G. Maritsas, % Process synchronization in the parallel SIMULA %machine. In {\it Proc.~ %of the International Conference on Parallel Processing\/}, (August 1981). %pp. 297--299.} % %\createrefcode{papazagloum84}{Papazaglou et al. 1984} %{Papazaglou, M.P., P.I. Georgiadis, %and D.G. Maritsas, %Architectural Considerations of the parallel SIMULA machine. %{\it The Computer Journal\/}, 27, 3, 254--259.} % \createrefcode{peacock79} {Peacock et al. 1979} {Peacock, J.K., J.W. Wong, and E. Manning 1979. Distributed simulation using a network of micro-processors. {\it Computer Networks 3\/}, 1 (February), 44--56.} \createrefcode{peacockj80} {Peacock et al. 1980} {Peacock, J.K., E. Manning, and J.W. Wong 1980. Synchronization of distributed simulation using broadcast algorithms. {\it Computer Networks 4\/}, 1 (February), 3--10.} %\createrefcode{pritsker79} %{Pritsker 1979} %{Pritsker, A.B. 1979. {\it The GASPIV %Simulation Language\/}. John Wiley \& Sons, New York, N.Y.} \createrefcode{reedds88} {Reed and Malony. 1988} {Reed, D.A. and A.D. Malony 1988. Parallel discrete event simulation: The Chandy-Misra approach. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 8--13. } \createrefcode{reed88} {Reed et al. 1988} {Reed, D.A., A.D. Malony, and B.D. McCredie 1988. Parallel discrete simulation using shared memory. {\it IEEE Transactions on Software Engineering\/}, 14, 4 (April), 541--553. } \createrefcode{reynolds82} {Reynolds 1982} {Reynolds, P.F. 1982. A shared resource algorithm for distributed simulation. In {\it Proc.~of the 9th Annual Symposium on Computer Architecture\/} (Austin, Texas, April 1982). pp. 259--266.} \createrefcode{reynoldsa83} {Reynolds 1983a} {Reynolds, P.F. 1983. The active logical process approach to distributed simulation. DAMACS Technical Report \#83--12, Dept. of Applied Math. and Computer Science, The University of Virginia, Charlottesville, Virginia.} \createrefcode{reynoldsb83} {Reynolds 1983b} {Reynolds, P.F. 1983. Active logical processes and distributed simulation: An analysis. In {\it Proc. of the 1983 Winter Simulation Conference\/} (December 1983). pp. 263--264.} \createrefcode{reynoldsjr87} {Reynolds 1987} {Reynolds, P.F. and C.S. Kuhn 1987. A performance study of three protocols for distributed simulation. In {\it Proc. of the Eastern Simulation Conference\/} (April 1987). pp. 26--31.} %\createrefcode{shannon75} %{Shannon 1975} %{Shannon, R.E. (1975),{\it Systems Simulation: The Art %and Science\/}, Prentice-Hall, Inc., Englewood Cliffs, N.J.} \createrefcode{sheppard82} {Sheppard et al. 1982} {Sheppard, S., D.T. Philips, and R.E. Young 1982. The design and implementation of a microprocessor-based distributed digital simulation system. {NSF Proposal ECS--8215550\/}, 1982, College Station, Texas.} \createrefcode{sheppards84} {Sheppard et al. 1984} {Sheppard, S., P. Friel, and D. Reese 1984. Simulation in Ada: An implementation of two world views. In {\it Proceedings of the Conference on Simulation in Strongly Typed Languages\/} (February 1984). 13, 2, pp. 3--9.} \createrefcode{sheppardc85} {Sheppard 1985} {Sheppard, S. 1985. The Design and implementation of a distributed simulation system: a status report. Texas A \& M University Computer Science Technical Report 85--001, College Station, Texas.} \createrefcode{shepparda85} {Sheppard et al. 1985a} {Sheppard, S., U. Chandra, and K. Murray 1985. Distributed simulation using Ada. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 27--31.} \createrefcode{sheppardb85} {Sheppard et al. 1985b} {Sheppard, S., R.E. Young, U. Chandra, and M. Krishnamurthi 1985. Wyatt, D. Three mechanisms for distributing simulation. In {\it Proc.~of the 12th Conference of the NSF Production Research and Technology Program\/} (Madisonville, Wisconsin, May 14--17, 1985). pp. 67--70.} \createrefcode{sheppardds88} {Sheppard et al. 1988} {Sheppard, S.V., C.K. Davis and U. Chandra 1988. Parallel simulation environments for multiprocessor architectures. In {\it Proceedings of the SCS Multiconference on Distributed Simulation\/} (San Diego, California, February 1988). pp. 109--114. } %\createrefcode{takenouchi80} %{Takenouchi et al. 1980} %{Takenouchi, H., M. Hatada, K. Hiyama, %H. Inamori, , %and M. Toda, 1980. Parallel processing %simulator for network systems %using multi-microcomputers. In {\it Proceedings of the COMPCON Fall %1980\/}, pp. 55--62.} \createrefcode{wyatt83} {Wyatt et al. 1983} {Wyatt, D.L., S. Sheppard, and R.E. Young 1983. An experiment in microprocessor-based distributed digital simulation. In {\it Proc.~of the 1983 Winter Simulation Conference\/} (December 1983). pp. 271--277.} \createrefcode{wyattd84} {Wyatt and Sheppard 1984} {Wyatt, D.L., and S. Sheppard 1984. A Language directed distributed discrete simulation system. In {\it Proc. of the 1984 Winter Simulation Conference\/} (Dallas, Texas, November 1984). pp. 463--464.} \createrefcode{wyattdl85} {Wyatt 1985} {Wyatt, D.L. 1985. Simulation programming on a distributed system : A preprocessor approach. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 32--36.} \createrefcode{wyattdla86} {Wyatt 1986} {Wyatt, D.L. 1986. An experiment in the design of a microprocessor-based distributed discrete simulation language. {Ph.D. Dissertation\/}, Texas A \& M University, College Station, Texas, August 1986.} \createrefcode{zeigler84} {Zeigler 1984} {Zeigler, B.P. 1984. System-theoretic representation of simulation models. {\it IIE Transactions 16\/}, 1 (March), 19--34.} \createrefcode{zeiglerb85} {Zeigler 1985} {Zeigler, B.P. 1985. Discrete event formalism for model based distributed simulation. In {\it Proceedings of the Conference on Distributed Simulation 1985, The 1985 SCS Multiconference\/} (San Diego, California, January 1985). 15, 2, pp. 3--7.} \vfill\eject \bye
bjornl@tarpon.tds.kth.se (Bj|rn Lisper) (04/18/89)
In this context I want to mention my colleague Rassul Ayani's PhD thesis "Parallel Simulation on Shared Memory Multiprocessors", TRITA-TCS-8903, Dept. of Telecommunication and Computer Systems, The Royal Institute of Technology, May 1989. It's just printed and will be defended in the beginning of May. Rassul has developed a method to determine moving time windows, where for each simulated object the events in its time window surely can be executed in parallel with the events in the other objects time windows. Send requests for the thesis to: Ms. Eivor Atterby Dept. of Telecommunication and Computer Systems The Royal Institute of Technology S-100 44 Stockholm SWEDEN I think we give away single copies for free, but I'm not 100 % sure. Bjorn Lisper
matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (04/27/89)
In article <5117@hubcap.clemson.edu> usha%FSU.BITNET@CUNYVM.CUNY.EDU writes: >The following is a list of references on distributed simulation. But >this has not been updated for the past two years. I'm glad you brought up the subject, because I would like to raise a question. Here is the setting. It's discrete-event simulation, which many or most of your references are concerned with, on multiple-processor systems (tightly or loosely coupled). Let P be the number of processors and T be the amount of run time needed to get the desired statistical accuracy. The usual approach in these papers is to have different processors handle different parts of this program. E.g. one processor might manage the event list, another might do the statistics collection, etc. Or, in process-oriented simulators like Simula or csim (or, for that matter, Modula II applied to simulation), each processor might handle a different process. It has always seemed to me that these methods are inherently wasteful. Each processor is going to have a substantial amount of idle time. Moreover, it seems to me that the best method is also the simplest -- just have all P processors run the uniprocessor version of the program, but for T/P amount of run time, instead of T. [Of course, each processor needs to use a different random number seed.] This method gives 100% utilization for all processors, and except for start-up transients, should give the same statistical accuracy as the other methods, in fact BETTER accuracy, since in the other methods time T is actually less than time T, due to idle periods. Is anyone aware of any thorough work on this question, i.e. the question of whether the complex methods are better or worse than the simple one? I think there was a brief paper by Heidelberger on the effect of transients in this context a few years ago in one of the Winter Simulation Conferences, but it didn't go very far. Some of the complex methods give a really elegant theory (e.g. see the references to Chandy), but it still seems to me that the simple method will give better performance. Norm
wen-king@csvax.caltech.edu (Wen-King Su) (04/28/89)
In article <5276@hubcap.clemson.edu> matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes: > <It has always seemed to me that [distributed] methods are inherently wasteful. >Each processor is going to have a substantial amount of idle time. < >Moreover, it seems to me that the best method is also the simplest -- <just have all P processors run the uniprocessor version of the program... > <This method gives 100% utilization for all processors... Since you are not worried about the elapse time of the simulation runs, which may is itself the most important consideration in some simulations, I will assume that throughput is the single most important consideration here. Even with this assumption, however, it still depends on how you compute the cost of hardware. If the cost is proportional to the number of processors, running one sequential simulation on each processor clearly is optimal. Since 100% of the processors are in use at all time, as the number of processors doubles, the throughput doubles as well. Nothing can do better than that. However, if the cost is proportional to the amount of hardware involved, distributed simulation can be more efficient. For the purpose of this discussion, let us first assume that: 1) Hardware used == the amount of memory involved. 2) Simulation requires the same amount of memory no matter how many processors are used. Now, if we further assume that the simulation shows no speedup, it will still be as efficient as the sequential simulation. However, if we assume that the speedup is linear, every time we double the processors and half the memory per processor, we also half the elapse time. We doubled the throughput using essentially the same amount of hardware. Although the truth is somewhere in between, if the distributed simulation shows any speedup at all, it will be more efficient than the sequential simulation. In reality, neither 1) nor 2) are true, but the truth lies not far away and the distributed simulation has to show some degree of speedup just to break-even. But if it does better than break-even, it will be more efficient than the sequential simulation. One may argue that in real life, the processor size can't change as we use more and more processors. In real life, however, neither can a simulation that tightly fits a 100 processor multicomputer be run on just one processor of the multicomputer. /*------------------------------------------------------------------------*\ | Wen-King Su wen-king@vlsi.caltech.edu Caltech Corp of Cosmic Engineers | \*------------------------------------------------------------------------*/
wagner@june.cs.washington.edu (Bullwinkle J. Moose) (04/28/89)
In article <5276@hubcap.clemson.edu>, matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes: > In article <5117@hubcap.clemson.edu> usha%FSU.BITNET@CUNYVM.CUNY.EDU writes: > > Here is the setting. It's discrete-event simulation, which many or most > of your references are concerned with, on multiple-processor systems > (tightly or loosely coupled). Let P be the number of processors and T > be the amount of run time needed to get the desired statistical accuracy. > ... > Moreover, it seems to me that the best method is also the simplest -- > just have all P processors run the uniprocessor version of the program, > but for T/P amount of run time, instead of T. [Of course, each processor > needs to use a different random number seed.] > > Is anyone aware of any thorough work on this question, i.e. the question > of whether the complex methods are better or worse than the simple one? > I think there was a brief paper by Heidelberger on the effect of > transients in this context a few years ago in one of the Winter > Simulation Conferences, but it didn't go very far. Well, funny you should ask, because I have just finished writing up the chapter of my Ph.D. dissertation dealing with this question. In fact, Heidelberger is the only one I know of who has done any analysis of this problem. The complete references are: @InProceedings{ Heidelberger86, Author= "P. Heidelberger", Title= "{Statistical Analysis of Parallel Simulations}", Booktitle= "Proc. 1986 Winter Simulation Conference", Year = 1986, Pages= "290-295", Organization= SCS, Abstract= "Comparison of independent replications with a single long parallel simulation." } @TechReport{ Heidelberger87, Author= "P. Heidelberger", Title= "{Discrete Event Simulations and Parallel Processing: Statistical Properties}", Institution= "IBM Thomas J. Watson Research Center", Year = 1987, Number= "RC 12733 (\#57302)", Address= "Yorktown Heights, NY", Month= May, Abstract= "Proves that certain statistical estimators obtained by running separate replications in parallel are biased, and that some even converge to the wrong value!" } Heidelberger86 examined the tradeoff between intra-replication parallelism M and inter-replication parallelism K when the total number of processors is some constant (hence P = MK). In essence, the results of Heidelberger86 are that intra-replication parallelism should be favored over inter-replication parallelism under the following circumstances: for systems with a strong initial transient; for short runs; or if a large number of processors are available. While I have very little to add to this in an analytical sense, I've spent quite some time evaluating the practical impact of his results. While it's true that running parallel independent replications has a lot of appeal, there are some mitigating considerations. The first and most obvious of these is that beyond a certain point, the marginal effect on confidence interval width of increasing the number of replications becomes insignificant. Thus, given enough processors, it still makes sense to parallelize each replication. Second, because of the presence of an initialization bias in any stochastic simulation, it is important to run each replication long enough to ensure that the confidence intervals that one obtains for some estimator actually contain the true value of what is being estimated. (I.e., an extremely large number of very short replications will probably result in very small confidence intervals that do not contain the true value!) What this means in practical terms is that if elapsed wall clock time is the overriding consideration, you may not be able to achieve accurate results unless you parallelize each replication. Third, Heidelberger87 pointed out that as the number of replications run in parallel increases, statistical considerations require stopping rules that lead to poor processor utilization near the end of the simulation run. Grossly oversimplifying, it turns out that if the completion time of individual replications is a random variable, then in order to obtain unbiased estimators of model outputs it is necessary to wait for all replications to finish. (The randomness of completion times may be a significant factor when the simulation is estimating transient behavior of a model, or when the regenerative method for obtaining confidence intervals is being used within each replication.) In fact, under certain assumptions it turns out that as the number of processors P increases, the speedup obtained tends toward P/logP rather than P. (This is similar to a result by Kruskal and Weiss regarding the completion time of a group of self-scheduling processors taking tasks off of a central task queue.) Therefore, your assertion that > This method gives 100% utilization for all processors, ... is statistically untrue. Lastly, I have taken the analysis of Heidelberger86 and tweaked it just a bit for purposes of illustration. Rather than fixing a running time and comparing the mean square error (mse) of the alternatives obtained by assigning various values to M and K, I have fixed the mse and time to completion of the sequential case (M=1, K=P) and calculated what speedup the parallel simulation would need to achieve (as a function of M) in order to better the accuracy of the purely sequential method. I call this the break-even speedup. It can be interpreted in one of two ways: if the parallel simulation is capable of exceeding the break-even speedup, then either (a) the parallel strategy can produce the same level of statistical confidence as the purely sequential strategy in less elapsed time, or (b) the parallel strategy can produce a higher level of statistical confidence than the purely sequential strategy in the same amount of elapsed time. The advantage of this analysis is that it does not require any assumptions about the speedup behavior of the parallel simulation. The general behavior of the break-even speedup is that it is almost linear in M, but the slope of the line decreases rapidly as the total number of processors P increases. For really large P (e.g., 1024) the break-even speedup is truly paltry even for large M, so that any reasonable implementation of a parallel simulation ought to be able to produce a win. (Note: I'm not suggesting that it's realistic to expect a parallel simulator to show speedups on 1024 processors (in fact, that wouldn't give you anything on which to base confidence intervals); rather, given 1024 processors, one ought to run a M=32, K=32 parallel simulation, or, if your simulation starts to wheeze and die at 32 processors, maybe only M=16, K=64.) Finally, a pragmatic consideration against running multiple independent replications in parallel is that some simulations have enormous memory requirements, thus, it actually may not be possible to run the replications simultaneously. In fact, Jefferson et al. at JPL are running into situations in which they can't even calculate speedup because they can't run 1-processor versions of the problems they are simulating (they are running on a distributed memory architecture). Dave Wagner University of Washington Comp Sci Department wagner@cs.washington.edu {cornell,harvard,ssc-vax,tektronix}!uw-beaver!wagner P.S. My complete dissertation will be available from the University of Washington sometime this summer. The title is "Conservative Parallel Discrete-Event Simulation: Principles and Practice".
eugene@eos.arc.nasa.gov (Eugene Miya) (04/28/89)
The IBM TR mentioned in Dave Wagner's paper was published in the 1987 ACM/SIGMETRICS (Banff) proceedings (shortened slightly). It was one of the better papers at that conference. Longish signature follows "Type 'n' now" Another gross generalization from --eugene miya, NASA Ames Research Center, eugene@aurora.arc.nasa.gov resident cynic at the Rock of Ages Home for Retired Hackers: "You trust the `reply' command with all those different mailers out there?" "If my mail does not reach you, please accept my apology." {ncar,decwrl,hplabs,uunet}!ames!eugene Live free or die.
matloff@iris.ucdavis.edu (Norm Matloff) (04/28/89)
In article <5311@hubcap.clemson.edu> wen-king@csvax.caltech.edu (Wen-King Su) writes: >In article <5276@hubcap.clemson.edu> matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) writes: >> ><It has always seemed to me that [distributed] methods are inherently wasteful. >>Each processor is going to have a substantial amount of idle time. >< >>Moreover, it seems to me that the best method is also the simplest -- ><just have all P processors run the uniprocessor version of the program... ><This method gives 100% utilization for all processors... >Since you are not worried about the elapse time of the simulation runs, >which may is itself the most important consideration in some simulations, I'm surprised to see you say this. Am I missing something, maybe a subtle distinction between elapsed time and some other measure of time? I *am* worried about the time. In fact, I even specified a time T for a uniprocessor version, which it is hoped to reduce to T/P in a P-processor system. What has to be held constant in any comparison is accuracy. I defined the accuracy level to be whatever comes from running the simulation on a uniprocessor for time T. The problem with both kinds of distributed simulations -- the task-partitioned and replicated types -- is that in the former, there is less than 100% utilization, while in the latter, there is initialization bias. So in both cases, the desired accuracy needs more than T/P time, i.e. there is less than a factor-P speedup. The only question is which is better. >distributed simulation can be more efficient. For the purpose of this >discussion, let us first assume that: > 1) Hardware used == the amount of memory involved. > 2) Simulation requires the same amount of memory > no matter how many processors are used. As far as I know, memory is NOT a problem. The major data structure is the event queue, and in typical applications this is not very long at all, maybe a dozen items. Norm
matloff@iris.ucdavis.edu (Norm Matloff) (04/28/89)
In article <5312@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes: >While it's true that running parallel independent replications has a >lot of appeal, there are some mitigating considerations. The first and >most obvious of these is that beyond a certain point, the marginal >effect on confidence interval width of increasing the number of >replications becomes insignificant. This can not be literally true. The accuracy will be roughly inversely proportional to the square root of the number of replications, no matter how many replications there are. If you quadruple the number of replications, you should always get double the accuracy (except for init. bias). Perhaps you meant that after a certain point, any further decrease in CI width are of marginal value to the USER? >Second, because of the presence of an initialization bias in any >stochastic simulation, it is important to run each replication long >enough to ensure that the confidence intervals that one obtains for >some estimator actually contain the true value of what is being >estimated. Yes, as I recall, that was the theme of Heidelberger's paper. But it seems to me that it's not so simple. For example, in typical applications, a number of runs will be made of the same simulation program, with different parameters, e.g. different arrival rates. After doing the first run, the user has a pretty good idea of the steady-state queue lengths, and can initialize the other runs with nonempty queues of those approximate sizes. This should reduce the initialization bias quite substantially. Or, better yet, in systems which have regeneration points and don't have huge (in the notation from my posting yesterday, >> T/P) regeneration cycle times, one can simply start each replicate at the regeneration point -- thus no initialization bias at all. Of course, you do run into the problem you mentioned, i.e. the problem of "waiting for the slowest one," which gave you your P/log P figure. However, analogous kinds of problem should plague the task-partitioning approach too, e.g. as P increases, the problem of idle time should get worse and worse. The task-partitioning approach also has a scaling problem. For a fixed application, say a 10-server queue, there will be a very small limit on the usable value of P. After a certain point, you just can't find anything for the processors to do. On the other hand, the replicates approach has this too, i.e. for a fixed problem, a large value of P exacerbates either the initialization bias problem or the "wait for the slowest one problem." So I don't think that either approach is a clear win. But the replicates approach certainly is much simpler. >Finally, a pragmatic consideration against running multiple independent >replications in parallel is that some simulations have enormous memory >requirements, In most simulations of queueing systems in practice, memory should not be a problem. The length of the event list should be on the order of magnitude of the number of servers, which is generally not large. Norm
chandra%loligo@gatech.edu (Usha Chandra) (05/01/89)
>I'm glad you brought up the subject, because I would like to raise a >question. > >Here is the setting. It's discrete-event simulation, which many or most >of your references are concerned with, on multiple-processor systems >(tightly or loosely coupled). Let P be the number of processors and T >be the amount of run time needed to get the desired statistical accuracy. > >The usual approach in these papers is to have different processors handle >different parts of this program. E.g. one processor might manage the >event list, another might do the statistics collection, etc. Or, in >process-oriented simulators like Simula or csim (or, for that matter, >Modula II applied to simulation), each processor might handle a different >process. > >It has always seemed to me that these methods are inherently wasteful. >Each processor is going to have a substantial amount of idle time. Thanks to the advances in parallel hardware, this method has exhibited good performance. Please refer the recent publications by Chandy and Sherman, Jefferson et al, Reed and Malony and Fujimoto. Some of the papers that you may want to lood at are: 1. K.M. Chandy, R. Sherman, The conditional event approach to distibuted simulation, .....(I do not remember where it was published) But you may be able to contact R. Sherman at the following address: USC/Information Sciences Institute 4676 Admiralty Way Suite 1001 Marina Del Rey, CA 90292 2. D.A. Reed, A.D. Malony, Parallel discrete event simulation:......, Proc. of the conf on Distributed Simulation, 19,3, 1987. 3. R.M. Fujimoto, Performance measurements....., Proc. of the conf. of distributed simulation, 19, 3, 1987. You may be able to find the most recent papers in proceedings of the Winter Simulation Conference, Eastern Simulation Conference, SCS Multiconference ???, and Summer Computer Simulation Conference. >Moreover, it seems to me that the best method is also the simplest -- >just have all P processors run the uniprocessor version of the program, >but for T/P amount of run time, instead of T. [Of course, each processor >needs to use a different random number seed.] > Some work has been done by John Comfort. He distributed the support functions such as the random number generation, queue handling, statistics collection, event processing etc on multiple processors and got some performance out of it but not quite enough. It is not that simple. A uniprocessor version will use some form of event list (a sequential list) and assigning the first n events to n processors may indeed lead to incorrect simulation. For example, consider an event-list with time stamps, t1,t2,t3 ... t1 <= t2 <= t3. Processing the event with time stamp t1 may cause an event to be inserted between t1 and t2. Usha Chandra
matloff%mole.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/02/89)
In article <5344@hubcap.clemson.edu> chandra%loligo@gatech.edu (Usha Chandra) writes: Speaking of task-oriented simulators which put tasks, e.g. simulations of different servers, on different machines, I had said: >>It has always seemed to me that these methods are inherently wasteful. >>Each processor is going to have a substantial amount of idle time. >Thanks to the advances in parallel hardware, this method has exhibited >good performance. Please refer the recent publications by Chandy and >Sherman, Jefferson et al, Reed and Malony and Fujimoto. I've seen a number of these. Again, there are inherent problems, e.g. scaling. Say the application to be simulated has 20 servers. There is really no way that the task-oriented approach can be fruitful with much more than 20 processors. >>Moreover, it seems to me that the best method is also the simplest -- >>just have all P processors run the uniprocessor version of the program, >>but for T/P amount of run time, instead of T. [Of course, each processor >>needs to use a different random number seed.] >Some work has been done by John Comfort. He distributed the support >functions such as the random number generation, queue handling, >statistics collection, event processing etc on multiple processors and >got some performance out of it but not quite enough. Yes, I also referred to this method in my original posting. I think it has even less potential than the task-oriented method. >It is not that simple. A uniprocessor version will use some form of >event list (a sequential list) and assigning the first n events to n >processors may indeed lead to incorrect simulation. For example, >consider an event-list with time stamps, t1,t2,t3 ... t1 <= t2 <= t3. >Processing the event with time stamp t1 may cause an event to be >inserted between t1 and t2. Yes, I think that the recent work on methods for correct synchronization, etc., by Chandy etc., are very elegant and impressive. But my point is still that for a stochastic system, these should have essentially the same performance as the replication method, and they are a lot more difficult. Norm
wagner@june.cs.washington.edu (Bullwinkle J. Moose) (05/02/89)
In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes: > In article <5312@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) (Hey! that's me!) writes: > >Because of the presence of an initialization bias in any > >stochastic simulation, it is important to run each replication long > >enough to ensure that the confidence intervals that one obtains for > >some estimator actually contain the true value of what is being > >estimated. > > Yes, as I recall, that was the theme of Heidelberger's paper. But > it seems to me that it's not so simple. > > For example, in typical applications, a number of runs will be made > of the same simulation program, with different parameters, e.g. > different arrival rates. After doing the first run, the user has > a pretty good idea of the steady-state queue lengths, and can > initialize the other runs with nonempty queues of those approximate > sizes. This should reduce the initialization bias quite substantially. Although your argument seems quite reasonable, it is, alas, untrue. First of all, note that the steady-state of a stochastic process is represented not by a single value, but by a *distribution* of that value. Therefore, what you really need to do is estimate the distribution of the steady-state, and then start up a number of runs, with the intitial conditions chosen at random from that distribution. (Note that if you knew the distribution of steady-state conditions exactly, then there would be no point to doing the simulation at all!) As an example of what happens if steady-state is chosen to be the average value instead of a set of samples from the (estimated) distribution, I refer you to "Statistical Analysis of Simulation Output Data", by A.M. Law, Oper. Res. 31(6):983-1029 (June 1983). In the following excerpt from pages 1015-1017, my own comments are enclosed in [...]: "The suggested procedure [using pilot runs to estimate steady-state] would be easy to apply in the case of an M/M/1 queue, since the state of the system is a single integer-valued r.v., namely, the number of people in the system. In the case of a complex real-world simulation, however, this procedure may require estimating the joint distribution of a large number of r.v.'s, some of which may be continuous. As a simple example of this difficulty, consider a single-server queueing system without exponential interarrival times or service times. In this case, we would have to estimate the joint distribution of number in system, time to the next arrival, and the remaining service time of the customer in service (if any). [ In the following paragraph, Di is defined to be the delay experienced by the ith customer to arrive at the server, and d is defined to be the steady-state expected average delay, i.e. lim(n-->oo) (D1+D2+...+Dn)/n] "Kelton and Law ["The Transient Behavior of the M/M/s Queue with Implications for Steady-State Simulation", TR 82-7, Dept. of MIS, U. of Arizona, 1982] computed E(Di) as a function of i and the number in system at time 0, say, l [lower-case l, not 1] for various M/M/s queues.... Let i(l,0.01) [be] the smallest point [value of i] beyond which all values of E(Di) fall withing 1 percent of d, viewed as a function of the initial number in the system, l. Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01) [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] is minimized for l = 15. This result is interesting since [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] the steady-state time-average number of customers in system is 9. [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] In general, Law's article is an excellent guide to the pitfalls (his word) of using classical statistical reasoning in the analysis of simulation output data. Not to mention intuition! > Or, better yet, in systems which have regeneration points and don't > have huge (in the notation from my posting yesterday, >> T/P) > regeneration cycle times, one can simply start each replicate at > the regeneration point -- thus no initialization bias at all. True, the method of regeneration theoretically does not suffer from the initialization bias problem. But "theoretically" is the key word here, because unless what you are simulating is truly a regenerative process, your regeneration points are only approximate. And using regeneration would definitely introduce the problem of stragglers (see my last posting), because it might lead to a wide variation in completion time of the replications unless the number of regeneration cycles within each replication is very large. > Of course, you do run into the problem you mentioned, i.e. the > problem of "waiting for the slowest one," which gave you your > P/log P figure. However, analogous kinds of problem should plague > the task-partitioning approach too, e.g. as P increases, the > problem of idle time should get worse and worse. But, since the number of replications being run when intra-replication parallelism is used in less than P, the magnitude of the effect is smaller. I didn't claim that the "tail effect" problem was a major one, only that it "nudges the scale" a bit in the direction of intra-replication parallelism. > The task-partitioning approach also has a scaling problem. For > a fixed application, say a 10-server queue, there will be a very > small limit on the usable value of P. After a certain point, > you just can't find anything for the processors to do. Definitely true, which is why the best approach for a large number of processors is probably a combination of inter- and intra- replication parallelism. > > So I don't think that either approach is a clear win. But the > replicates approach certainly is much simpler. > > >Finally, a pragmatic consideration against running multiple independent > >replications in parallel is that some simulations have enormous memory > >requirements, > > In most simulations of queueing systems in practice, memory should not > be a problem. The length of the event list should be on the order of > magnitude of the number of servers, which is generally not large. Depends on what is being simulated. I have friends at Bellcore who were simulating a metropolitan trunk network in which the steady-state size of the event list was about 15 THOUSAND events. "All generalizations are false, including this one." Dave Wagner University of Washington Comp Sci Department wagner@cs.washington.edu {cornell,harvard,ssc-vax,tektronix}!uw-beaver!wagner
matloff%crow.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/03/89)
In article <5367@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes: >In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes: >Although your argument seems quite reasonable, it is, alas, untrue. >First of all, note that the steady-state of a stochastic process is >represented not by a single value, but by a *distribution* of that value. Actually, Law's statement that you quoted is self-contradictory, it seems to me. He is apparently willing to use the mean in one-dimensional problems but not multidimensional problems, which doesn't make sense. >Therefore, what you really need to do is estimate the distribution of the >steady-state, and then start up a number of runs, with the intitial >conditions chosen at random from that distribution. True, this would be optimal. But does it make that much difference? Unfortunately, I don't have time to go read the paper right now, but I would guess that it really doesn't make that much difference, and that using the mean is nearly optimal in terms of *result*. In other words, even though 9 and 15 are far apart in the excerpt quoted below, I would guess that E(D9) and E(D15) are NOT far apart. > Let i(l,0.01) [be] the smallest point [value of i] > beyond which all values of E(Di) fall withing 1 percent of d, > viewed as a function of the initial number in the system, l. > Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01) > [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] > is minimized for l = 15. This result is interesting since > [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] > the steady-state time-average number of customers in system is 9. > [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] >> Or, better yet, in systems which have regeneration points and don't >> have huge (in the notation from my posting yesterday, >> T/P) >> regeneration cycle times, one can simply start each replicate at >> the regeneration point -- thus no initialization bias at all. >True, the method of regeneration theoretically does not suffer from the >initialization bias problem. But "theoretically" is the key word here, >because unless what you are simulating is truly a regenerative process, >your regeneration points are only approximate. Again, I think approximate is good enough in many cases. Also, I would mention that I've seen a number of specific applications in which it was claimed that there was no regeneration point, but that if things like symmetries are taken into account, it turns out that there are good regeneration points which one can use. >> Of course, you do run into the problem you mentioned, i.e. the >> problem of "waiting for the slowest one," which gave you your >> P/log P figure. However, analogous kinds of problem should plague >> the task-partitioning approach too, e.g. as P increases, the >> problem of idle time should get worse and worse. >But, since the number of replications being run when intra-replication >parallelism is used in less than P, the magnitude of the effect is smaller. No, what I am saying is that even with no replication at all, there will be **analogous** straggler effects, due to interprocess dependencies. >> The task-partitioning approach also has a scaling problem. For >> a fixed application, say a 10-server queue, there will be a very >> small limit on the usable value of P. After a certain point, >> you just can't find anything for the processors to do. >Definitely true, which is why the best approach for a large number of >processors is probably a combination of inter- and intra- replication >parallelism. That could well be true. >> In most simulations of queueing systems in practice, memory should not >> be a problem. The length of the event list should be on the order of >> magnitude of the number of servers, which is generally not large. >Depends on what is being simulated. I have friends at Bellcore who were >simulating a metropolitan trunk network in which the steady-state size of >the event list was about 15 THOUSAND events. Yes. John Comfort once mentioned in a conference that he had seen one case in which there were typically 100,000 events in the list. But that's why I said "most." Norm
matloff%crow.Berkeley.EDU@ucbvax.Berkeley.EDU (Norman Matloff) (05/03/89)
In article <5367@hubcap.clemson.edu> wagner@june.cs.washington.edu (Bullwinkle J. Moose) writes: >In article <5319@hubcap.clemson.edu>, matloff@iris.ucdavis.edu (Norm Matloff) writes: [I replied to Dave's article this morning, but apparently it never got to the net, so I'm replying again. The version here has been updated.] *First of all, note that the steady-state of a stochastic process is *represented not by a single value, but by a *distribution* of that value. True, but if our aim is to reduce initialization bias, a single value should work reasonably well. See below. * [ In the following paragraph, Di is defined to be the delay experienced * by the ith customer to arrive at the server, and d is defined to be the * steady-state expected average delay, i.e. lim(n-->oo) (D1+D2+...+Dn)/n] * "Kelton and Law ["The Transient Behavior of the M/M/s Queue with * Implications for Steady-State Simulation", TR 82-7, Dept. of MIS, * U. of Arizona, 1982] computed E(Di) as a function of i and the number * in system at time 0, say, l [lower-case l, not 1] for various M/M/s * queues.... * Let i(l,0.01) [be] the smallest point [value of i] * beyond which all values of E(Di) fall withing 1 percent of d, * viewed as a function of the initial number in the system, l. * Kelton and Law found for the M/M/1 queue with rho=0.9 that i(l,0.01) * [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] * is minimized for l = 15. This result is interesting since * [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] * the steady-state time-average number of customers in system is 9. * [^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^] Yes, 9 is fairly different from 15. However, it doesn't make much difference in initialization bias. In Law's graph, the l = 10 and l = 15 lines (there is none for l = 9) give about the same sized initialization bias. In other words, just a "ballpark" value is probably good enough to reduce the bias to acceptable scale. >> Or, better yet, in systems which have regeneration points and don't >> have huge (in the notation from my posting yesterday, >> T/P) >> regeneration cycle times, one can simply start each replicate at >> the regeneration point -- thus no initialization bias at all. *True, the method of regeneration theoretically does not suffer from the *initialization bias problem. But "theoretically" is the key word here, *because unless what you are simulating is truly a regenerative process, *your regeneration points are only approximate. Again, I think an approximate regeneration point is good enough for practical purposes. >> Of course, you do run into the problem you mentioned, i.e. the >> problem of "waiting for the slowest one," which gave you your >> P/log P figure. However, analogous kinds of problem should plague >> the task-partitioning approach too, e.g. as P increases, the >> problem of idle time should get worse and worse. *But, since the number of replications being run when intra-replication *parallelism is used in less than P, the magnitude of the effect is smaller. I think we're talking about different things here. What I am talking about occurs even if there is no replication. What I am saying is that due to interprocess dependencies, there will be process waiting involved, and that this should in many applications have a log P growth rate, just as in the replication case, even though for different reasons. >> The task-partitioning approach also has a scaling problem. For >> a fixed application, say a 10-server queue, there will be a very >> small limit on the usable value of P. After a certain point, >> you just can't find anything for the processors to do. *Definitely true, which is why the best approach for a large number of *processors is probably a combination of inter- and intra- replication *parallelism. This may well turn out to be the case. But I still feel that this has not been looked at sufficiently. Your work sounds excellent, but you yourself indicated that it's the only work to be done since Heidleberger's, which itself was just a preliminary investigation. >> In most simulations of queueing systems in practice, memory should not >> be a problem. The length of the event list should be on the order of >> magnitude of the number of servers, which is generally not large. *Depends on what is being simulated. I have friends at Bellcore who were *simulating a metropolitan trunk network in which the steady-state size of *the event list was about 15 THOUSAND events. I heard John Comfort give a talk once in which he mentioned hearing of some application in which the typical size of the event list was 100,000 events. That's why I qualified my statement above by using the word "most." :-) Norm
fishwick@fish.cis.ufl.edu (Paul Fishwick) (05/03/89)
[ Prof. Fishwick is the moderator of the comp.simulation group. For those who receive comp.parallel by direct mail, you can submit to him directly. --steve ] This is a small plea on behalf of our readers in "comp.simulation". Will individuals posting information concerning distributed simulation please post articles ALSO to comp.simulation ? I think you will find that this will make the discussion much more lively. Thanks for your consideration. -paul +------------------------------------------------------------------------+ | Prof. Paul A. Fishwick.... INTERNET: fishwick@bikini.cis.ufl.edu | | Dept. of Computer Science. UUCP: gatech!uflorida!fishwick | | Univ. of Florida.......... PHONE: (904)-335-8036 | | Bldg. CSE, Room 301....... FAX is available | | Gainesville, FL 32611..... | +------------------------------------------------------------------------+