[comp.os.research] Threads, LWPs - SUMMARY

renga@beaver.cs.washington.edu (Renganathan Sundararajan) (11/07/88)
   ***************************************
                  Long 
   ***************************************

           
	     Threads, Lightweight Processes.
	     ==============================
		       (A Summary)

   This is a summary of the e-mail messages that I received on this subject.
   In the following summary, I do not attempt to attribute the contributions 
   individually. My Thanks to :

    peter%sugar@uunet.UU.NET         (Peter da Silva)
    marzullo%cs.cornell.edu          (Keith Marzullo)
    sjs%ctt.bellcore.com             (Stan Switzer)
    mudd-j%hickory.cis.ohio-state.edu(John R. Mudd)
    ariel%lznh@att.att.com           (ariel aloni)
    tim%crackle@amd.com              (Tim Olson)
    gray%Pescadero.stanford.edu      (Cary Gray)
    retrac%rice.edu                  (John Carter)
    peter%ficc@uunet.uu.net          (Peter da Silva)
    campbell%redsox.bsw.com          (Larry Campbell)
    mason%pescadero.Stanford.EDU     (Tony Mason)
    irvin%cwi.nl%mcvax%mcvax%uunet.UU.NET (Irvin Shizgal)
    robbin@MoliEnergy.BC.CA          (Robbin.W.Johnson) 
    jesup%cbmvax.cbm.commodore.com   (Randell Jesup)
    johnl@ima.isc.com                (John Levine)
    fouts@lemming.nas.nasa.gov       (Marty fouts)
    schwan%wayward@gatech.edu        (Karsten Schwan)

    =======================================================================

    The words 'Process' and 'Task' have been used to mean different things
    in different operating systems. Adding to the confusion is the use of 
    these terms in languages such as ADA, MODULA-2 that allow concurrent 
    execution of 'processes', 'tasks' etc through explicit or implicit 
    creation.

    Historically, a process was thought of as the activation of a sequential 
    program. Before virtual memory and caches came into being, a process 
    just had an address space (that included  code, heap and stack space)
    and  a set of registers. Hence Context switching involved the saving
    and restoring of registers(including base address registers), condition 
    flags and OS table state if any. With the introduction of virtual memory
    and caches, it was necessary to save the MMU state, flush the cache or
    mark the cache entries as invalid or otherwise deal with them. 
    In other words, since the process switching involved a lot of work 
    these are called 'Heavy Weight Processes'.

    If a programming language allows the creation and concurrent execution 
    of 'processes', 'tasks' or whatever, is it necessary to have an OS process
    represent each of these? Obviously, no. These typically share the code
    space, and the data space and may have private heap, stack spaces.
    In other words, we have multiple threads of control executing within a
    single address space. These resemble procedure activations (each acti-
    vation has its own stack frame and heap space if necessary) and quite
    unlike the HWP described in the previous paragraph. We now have 'LWPs'
    or 'Threads'. 

    Judging from the responses I got, threads and light-weight processes 
    denote  the same thing.  They refer to a single address space with 
    multiple execution contexts. A LWP or a thread  consists of PC, registers, 
    flags, all that that needs to be saved to service an interrupt.
    Threads/LWPs  share a virtual address context with other
    threads, or in the case of some LWP schemes, they execute in the
    kernel so they just use the kernel's context.  Some people use LWPs to
    refer to things that just run in the kernel and threads to refer to things
    that run in user mode, but the terminology is far from standardized.

    Threads/LWPs  serve a number of purposes:

      -  Cheap creation, deletion, and context switching.

      -  Speed up inter-process communication, since the address space
	 is shared. Fast, efficient communication among a set of cooperating 
	 processes where the processes aren't necessarily `parallel' in nature.

      -	 Capture I/O/Computation concurrency that otherwise would require a
	 context switch,  often when servicing some sort of incoming 
	 network requests.


    Even the term thread has been used  in 2 different ways :
	- a thread of control within a  *** single **** address space and
	- a ** distributed ** thread of control executing in **different**
	  address spaces.

    The common feature between these 2 usages is the existence
    of a stack. In the formaer case, the stack exists in a single 
    address space and RPCs are implemented in terms of messages. 
    In the latter case, the stack may span address spaces (and machines)
    and the *thread* executes the RPCs.

  REFERENCES:
  ==========

    @Article{mach-overview,
    Author = "Richard F. Rashid",
    Title  = "Threads of a New System",
    Journal= "Unix Review",
    Number = 8,
    Volume = 4,
    Month  = aug,
    Year   = 1986,
    Pages  = "37--49"
    }

    @Article{V-process-groups,
    Author = "David. R. Cheriton and W. Zwaenepoel",
    Title  = "Distributed Process Groups in the {V} Kernel",
    Journal= TOCS,
    Volume = 3,
    Number = 2,
    Month  = may, 
    Year   = 1985,
    Pages  = "77--107"
    }

    Abu-Sufah, W., H.E. Husmann, and D.J. Kuck, ``On Input/Output Speedup
    in Tightly Coupled Multiprocessors,'' IEEE Transactions on Computers,
    June 1986, Volume C-35, pp. 520-30.

    Baron, R.V., D. Black, W. Bolosky, J. Chew, D.B. Golub, R. Rashid, A.
    Tevanian, Jr., and M.W. Young, ``MACH Kernel Interface Manual,''
    Carnegie-Mellon University Department of Computer Science, October
    1986.

    Black, A.P., ``Supporting Distributed Applications: Experience with
    Eden,'' Proceedings of the Tenth ACM Symposium on Operating System
    Principles, December 1985.

    Brooks, E.D. III, ``A Multitasking Kernel for the C and FORTRAN
    Programming Languages,'' Lawrence Livermore Laboratory, University of
    California, September 1984.

    Clark, D.D.  ``The Structuring of Systems Using UpCalls,'' Proceedings
    of the Tenth ACM Symposium on Operating System Principles, December
    1985, pp. 171-80.

    Jones, M.B., R.F. Rashid, and M.R. Thompson, ``Matchmaker: An
    Interface Specification Language for Distributed Processing,''
    Proceedings of the 12th Annual Symposium on Principles of Programming
    Languages, January 1985, pp. 225-35.

    Kameda, H., ``Effects of Job Loading Policies for Multiprogramming
    Systems in Processing a Job Stream,'' ACM Transactions on Computer
    Systems, Volume 4, February 1986, pp. 71-106.

    Kepecs, J. and M. Solomon, ``A Simplified Operating System for
    Distributed Applications,'' 3rd PODC Conference Proceedings, 1984.

    Rovner, P., R. Levin, and J. Wick, ``On Extending Modula-2 For
    Building Large, Integrated Systems,'' DEC Systems Research Center,
    January 1985 pp. 16-20.

    Schwan, K., T. Bihari, B.W. Weide, and G. Taulbee, ``High-Performance
    Operating System Primitives for Robotics and Real-Time Control
    Systems,'' The Ohio State University Department of Computer and
    Information Science, July 1986, report number OSU-CISRC-86TR1KS (to
    appear in ACM Transactions on Computer Systems, 1987).

    The Performance Implications of Thread Management
    Alternatives for Shared-Memory Multiprocessors
    by Thomas E. Anderson, Edward D. Lazowska, and
    Henry M. Levy.; Department of Computer Science
    FR-35, University of Washington, Seattle, WA 98195
    Tech Report 88-09-04

    (Note: Some of the above do not directly relate to the topic under
     discussion)
    
   ===============  END of SUMMARY ====================================

   The following is an extract of comments/opinions extracted from the 
   messages posted on os.research and the mail I got. Although I edited
   the comments/opinions, I have tried to preserve the context as much 
   as possible. There may be a few typos.


On the usefulness of Threads.
=============================

 >irvin@cwi.nl (Irvin Shizgal) writes:

  There are several reasons that light-weight processes are useful
  ("thread" refers to a light-weight process).
  
  (1) Light-weight processes share an address space, and thus switching
  context between them is simpler and faster (the address map doesn't
  change. Generally a light-wight context consist of only the program
  counter and registers.   If non-blocking system calls are available,
  they can even be implemented at user level.)
  
  (2) Since light-weight processes share an address space they their
  effective communications bandwidth is high.  The heavy-weight
  process thus encapsulates a unit which is assigned to a single
  (possibly multi-processor) machine.  On a multi-processor one can
  really run threads concurrently.
  
  (3) It is sometimes advantageous to design a program by splitting
  it up into concurrent tasks.  In many such cases code (and data
  structures) can be simplified by letting the operating system do
  the multiplexing.  If multi-tasking is to be used as a fundamental
  programming abstraction, there must be an efficient implementation

  >From: peter%sugar@uunet.UU.NET (Peter da Silva)

  One thing that occurs to me is that threads basically refine the question
  of what is the basic unit of resource allocation by allowing multiple
  execution streams to share a block of other resources. Some of these have
  been available from the beginning. Open files, for example, though the
  prevalence of the standard I/O library has limited the usefulness of this
  feature... you may recall that in the version 6 shell this was how control
  structures were implemented.


  >From: mcvax!cwi.nl!jack@uunet.UU.NET (Jack Jansen)

  LWPs are a very nice mechanism to program with, much nicer than 
  non-blocking I/O with completion interrupts or polling, etc.
  Just look at your average unix utility that uses SIGURG or select()
  and imagine what it would look like when written as a set of
  cooperating LWPs using blocking primitives and you'll get the
  point.....

On the origin of Threads,LWPs.
=============================

 >irvin@cwi.nl (Irvin Shizgal) writes: 

  I don't know how far back the notion of light-weight processes
  goes.  I do remember hearing about it back in the mid-70's in
  distributed/portable OS research at the University of Waterloo.

  >From: perry@apollo.COM (Jim Perry)

  "Threads" or multitasking is a very old technology, a natural
  outgrowth of the fact that I/O is too slow to be done synchronously.
  In more recent operating systems (well, in Unix, which seems to be
  what everyone means by OS) I/O scheduling is done among processes,
  i.e. when a process initiates an I/O operation it is suspended until
  that operation completes.  Other systems allow various mechanisms to
  allow the process to continue past the I/O call, with some means of
  announcing when pending operations complete.  A program using such a
  facility is multi-threaded in the simplest sense, and a more elegant
  model is easily constructed from such a system.
  
  The DTSS system (Dartmouth TimeSharing System) relies heavily on this 
  model. The command processor is a single program, which holds open all 
  user terminals with a task (thread, LWP) to handle each.  
  A few years ago the OS, PL/I compiler and runtime 
  system were modified to provide an excellent multitasking model
  that's cleaner and more usable than anything I've seen since (to
  be fair, if you get to control the OS, the language, and the runtime
  system, and are unconcerned about portability, you have a lot more
  leeway than e.g. trying to make something that does all this in
  portable C).  

  >From marzullo%cs.cornell.edu
  
  For a historical perspective, look at the paper "On the Duality of
  Operating System Structure" by Roger Needham and Hugh Lauer. It is
  in the ACM Operating System Review (the SIG newsletter) in the mid
  to late seventies. This was an extremely important paper for the time,
  as it brought to a head the debate on light vs. heavy weight processes.
  Several now reasonably famous people got their start disagreeing with
  this paper. Needham and Lauer's argument is superficially correct, but
  not  correct at a deeper level. The best argument I've heard about it
  (Loretta Reid, now at DEC) is that the client-server model is a pervasive
  one, and N&L showed how to recognize it and transform a HWP implementation
  to a LWP and vice versa.
  

  >From: jrg@Apple.COM (John R. Galloway)

  Data General has had multiple tasks (threads) running around in the same
  address space at least since the mid 70's.  I think such existed in the
  early versions of RDOS (just forground and background processes) as well
  and that would be in the early 70's.


On Who schedules the LWP for execution: the OS or the parent HWP?
=================================================================

  >From: dpm@cs.cmu.edu (David Maynard)

  Lightweights are often "scheduled" by code that is part of their
  parent--allowing the parent to schedule them intelligently based on known
  interaction patterns.  The OS scheduler only deals with real processes.
  PROBLEM: The classic example is page fault handling.  Suppose one of the
  lightweight processes faults.  As far as the OS is concerned, the entire
  process has faulted so that none of the lightweights can proceed until the
  page is brought in.
	   

  


On the problems of using LWPs 
=============================

  >From: dpm@cs.cmu.edu (David Maynard)

  By making the LWPs  share an address space, you give up a level of fault
  containment and take on the responsibility of coordinating access to shared
  resources.  This isn't necessarily bad, but you have to keep it in mind.
  See above for page fault handling.

  >sjs@ctt.bellcore.com (Stan Switzer)

  what happens when one lightweight process "blocks" for I/O?  Well, one 
  approach is to write I/O surrogate routines which invoke non-blocking 
  cognates of "read" and "write", mark the LWP as waiting for I/O 
  and call the LWP dispatcher.  Note that this is still non-preemptive 
  -- at least with the understanding that all I/O operations potentailly 
  release control.  (This is essentially what the SunOS 4.0 LWP scheme does.)

  But suppose that a LWP page faults.  This too is effectively a
  blocking I/O (and indeed, in MACH, if I understand correctly, users
  can provide their own handling for page faults), but things get
  complicated here.  If we don't block the whole process, then we
  effectively have a preemptive scheduler because we can lose control at
  any instruction or data fetch.  At this point we might as well let the
  operating system in on the game and let it schedule our LWPs.
  Although the terminalogy is by no means standard, when the OS knows
  about your LWPs, they are often called "threads."  But now your tasks
  have to worry about synchronization amongst themselves using monitors,
  semaphores, and suchlike, so in some sense, our gains have been
  ill-gotten: we lose some of the convenience we sought to gain (but we
  gain concurrency because multiple processors can execute the various
  threads).

  My feeling is that the OS itself should be nothing more than a
  processor and memory multiplexor, leaving everything else up to
  user-level code.  (It needs to provide some form of IPC too).  MACH,
  the V system, and AMOEBA (I understand AMOEBA best -- it's quite
  elegant) are all based on this basic concept.  In this context,
  "threads" can also be thought of unbundling processor multiplexing
  from memory multiplexing.
  
  In the final analysis, there is some tradeoff among modularity (and
  its close relatives reusability and plugability), convenience,
  effeciency, safety (esp. regarding shared memory usage), concurrency
  (esp in a multi-processor environment), and elegance.


  

On the Hardware support for threads, LWPs
=========================================
 
 >From: riedl@purdue.edu (John T Riedl)

 Most architectures already provide instructions for efficiently saving 
 processor state to improve procedure call performance.  Some machines 
 (like Suns) provide hardware support for address space context switching, 
 in the form of multiple hardware address space contexts.  This is the 
 reason for the performance 'knee' on Suns when more than a certain number 
 of contexts are in the run queue (7 on a 3/50; 15 on a Sun 4, I think).  
 In one way threads help avoid this knee, since a group of related
 asynchronous activities can be organized as a single address space.



On systems that support Threads/LWPs and On hardware support:
============================================================

 Both MACH  and V support threads. The XTMOS (XTM o.s of  Cogent Research,
 Salem (I think), Oregon) supports Threads along with the LINDA memory model
 on their "desktop supercomputer". The XTM machine provides micro-coded
 support for the creation, and scheduling of threads.

 From: irvin@cwi.nl (Irvin Shizgal) [AMOEBA project, CWI, Amsterdam] 

     Other operating systems using threads are Amoeba, "V" (David Cheriton at
     Standford), and even OS/2 (the latest Microsoft/IBM release).

 From: johnl@ima.isc.com: 

     There have been many implementations of threads and they have been 
     reinvented many places.  IBM's OS/360 MVT, the ancestor of MVS, had 
     them in the late 1960's.  OS/2 has them.

 From: schwan%wayward@gatech.edu (Karsten Schwan)

     There are now thread packages on the Encore Multimax (Brown built one) 
     and the BBN Butterfly running Chrysalis (we built it at Ohio State), and
     we are working to add some kernel thread support to the Mach
     version of the BBN Butterfly, which initially does not support
     threads.  In parallel to this work, real-time operating system folks have
     realized the importance of low-overhead executable units for
     some time...look at some real-time OS papers in ACM TOCS or
     in IEEE TC or TSE. Cheers, Karsten Schwan, Assc. Prof.,
     School of ICS, Georgia Tech.


 From: marzullo@cs.cornell.edu (Keith Marzullo)

     DEC SRC has a system is called Topaz, on the Firefly multiprocessor. 
     Almost every research group using Unix has added a version of threads 
     to their system:

 From: fouts@lemming. (Marty Fouts)

    I offer, only half humorously, TSO running under OS/360, as the first
    example of "multiple THREADS of control executing within a single
    address space"

    Consider that TSO ran as a partition and invoked its own round robin
    scheduler to schedule its quantum among its various terminal users,
    all of which were (sort of) within the same address space.  If TSO
    doesn't qualify, IBM implementations of APL workspaces might. . .

 From: modcomp!nigel@uunet.UU.NET (Nigel Gamble)

    The INMOS transputer is a microprocessor which supports creation and
    scheduling of processes in hardware.  See Byte, November 1988.

 From: mason@pescadero.Stanford.EDU

   I've not seen a machine that gives you explicit LWP support, but 
   quick register saves, register windows, and other such mechanisms 
   can be used to make LWPs quite fast.



On the coexistence of HWPs and LWPs
===================================

  Although many LWPs existing in the context of one HWP is to be expected,
  it is not uncommon for LWPs and HWPs to coexist without any relationship
  between them.

  >John Carter (retrac@rice.edu) writes:

   In V there is the notion of a "team", which is a HWP (each
   team has a different virtual address space) as well as a "process",
   which is a LWP.  Teams are made up of 1 or more processes.

  >karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) writes:

   LWPs can exist independent of any other processes.  Mine did - they 
   were an extra process-context level of interrupt processing.  
   LWPs tend to be used for very
   specialized functions (mine were deeply concerned with tty I/O),
   whereas typical processes tend to be intended for a whole bunch of
   things, which results in heavier usage of comparatively expensive
   resources like memory or peripheral I/O.

   My LWPs were spawned at boot time by the kernel just after creation of
   init, and they then sat waiting for tty events.  They were not
   attached to any other process in any way.  They did not interfere with
   normal HWPs in any way.

  >Peter da Silva (peter%ficc@uunet.uu.net) writes:

   AmigaOS is an example of an environment with no HWPs. Mach is
   an example of an environment where LWPs are within HWPs. I believe Minix
   is an example of an environment where the kernel contains LWPs but all
   user tasks are HWPs. Certainly TUNIS is.
   
  >Larry Campbell (campbell@bsw.com) writes:

  I believe that OS/2 and Mach both support threads as well as traditional
  processes.  You can implement non-preemptive threads yourself pretty
  easily under most operating systems if you don't mind writing a bit
  of assembler.



  Renganathan
  renga@cs.uoregon.edu