renga@beaver.cs.washington.edu (Renganathan Sundararajan) (11/07/88)
***************************************
Long
***************************************
Threads, Lightweight Processes.
==============================
(A Summary)
This is a summary of the e-mail messages that I received on this subject.
In the following summary, I do not attempt to attribute the contributions
individually. My Thanks to :
peter%sugar@uunet.UU.NET (Peter da Silva)
marzullo%cs.cornell.edu (Keith Marzullo)
sjs%ctt.bellcore.com (Stan Switzer)
mudd-j%hickory.cis.ohio-state.edu(John R. Mudd)
ariel%lznh@att.att.com (ariel aloni)
tim%crackle@amd.com (Tim Olson)
gray%Pescadero.stanford.edu (Cary Gray)
retrac%rice.edu (John Carter)
peter%ficc@uunet.uu.net (Peter da Silva)
campbell%redsox.bsw.com (Larry Campbell)
mason%pescadero.Stanford.EDU (Tony Mason)
irvin%cwi.nl%mcvax%mcvax%uunet.UU.NET (Irvin Shizgal)
robbin@MoliEnergy.BC.CA (Robbin.W.Johnson)
jesup%cbmvax.cbm.commodore.com (Randell Jesup)
johnl@ima.isc.com (John Levine)
fouts@lemming.nas.nasa.gov (Marty fouts)
schwan%wayward@gatech.edu (Karsten Schwan)
=======================================================================
The words 'Process' and 'Task' have been used to mean different things
in different operating systems. Adding to the confusion is the use of
these terms in languages such as ADA, MODULA-2 that allow concurrent
execution of 'processes', 'tasks' etc through explicit or implicit
creation.
Historically, a process was thought of as the activation of a sequential
program. Before virtual memory and caches came into being, a process
just had an address space (that included code, heap and stack space)
and a set of registers. Hence Context switching involved the saving
and restoring of registers(including base address registers), condition
flags and OS table state if any. With the introduction of virtual memory
and caches, it was necessary to save the MMU state, flush the cache or
mark the cache entries as invalid or otherwise deal with them.
In other words, since the process switching involved a lot of work
these are called 'Heavy Weight Processes'.
If a programming language allows the creation and concurrent execution
of 'processes', 'tasks' or whatever, is it necessary to have an OS process
represent each of these? Obviously, no. These typically share the code
space, and the data space and may have private heap, stack spaces.
In other words, we have multiple threads of control executing within a
single address space. These resemble procedure activations (each acti-
vation has its own stack frame and heap space if necessary) and quite
unlike the HWP described in the previous paragraph. We now have 'LWPs'
or 'Threads'.
Judging from the responses I got, threads and light-weight processes
denote the same thing. They refer to a single address space with
multiple execution contexts. A LWP or a thread consists of PC, registers,
flags, all that that needs to be saved to service an interrupt.
Threads/LWPs share a virtual address context with other
threads, or in the case of some LWP schemes, they execute in the
kernel so they just use the kernel's context. Some people use LWPs to
refer to things that just run in the kernel and threads to refer to things
that run in user mode, but the terminology is far from standardized.
Threads/LWPs serve a number of purposes:
- Cheap creation, deletion, and context switching.
- Speed up inter-process communication, since the address space
is shared. Fast, efficient communication among a set of cooperating
processes where the processes aren't necessarily `parallel' in nature.
- Capture I/O/Computation concurrency that otherwise would require a
context switch, often when servicing some sort of incoming
network requests.
Even the term thread has been used in 2 different ways :
- a thread of control within a *** single **** address space and
- a ** distributed ** thread of control executing in **different**
address spaces.
The common feature between these 2 usages is the existence
of a stack. In the formaer case, the stack exists in a single
address space and RPCs are implemented in terms of messages.
In the latter case, the stack may span address spaces (and machines)
and the *thread* executes the RPCs.
REFERENCES:
==========
@Article{mach-overview,
Author = "Richard F. Rashid",
Title = "Threads of a New System",
Journal= "Unix Review",
Number = 8,
Volume = 4,
Month = aug,
Year = 1986,
Pages = "37--49"
}
@Article{V-process-groups,
Author = "David. R. Cheriton and W. Zwaenepoel",
Title = "Distributed Process Groups in the {V} Kernel",
Journal= TOCS,
Volume = 3,
Number = 2,
Month = may,
Year = 1985,
Pages = "77--107"
}
Abu-Sufah, W., H.E. Husmann, and D.J. Kuck, ``On Input/Output Speedup
in Tightly Coupled Multiprocessors,'' IEEE Transactions on Computers,
June 1986, Volume C-35, pp. 520-30.
Baron, R.V., D. Black, W. Bolosky, J. Chew, D.B. Golub, R. Rashid, A.
Tevanian, Jr., and M.W. Young, ``MACH Kernel Interface Manual,''
Carnegie-Mellon University Department of Computer Science, October
1986.
Black, A.P., ``Supporting Distributed Applications: Experience with
Eden,'' Proceedings of the Tenth ACM Symposium on Operating System
Principles, December 1985.
Brooks, E.D. III, ``A Multitasking Kernel for the C and FORTRAN
Programming Languages,'' Lawrence Livermore Laboratory, University of
California, September 1984.
Clark, D.D. ``The Structuring of Systems Using UpCalls,'' Proceedings
of the Tenth ACM Symposium on Operating System Principles, December
1985, pp. 171-80.
Jones, M.B., R.F. Rashid, and M.R. Thompson, ``Matchmaker: An
Interface Specification Language for Distributed Processing,''
Proceedings of the 12th Annual Symposium on Principles of Programming
Languages, January 1985, pp. 225-35.
Kameda, H., ``Effects of Job Loading Policies for Multiprogramming
Systems in Processing a Job Stream,'' ACM Transactions on Computer
Systems, Volume 4, February 1986, pp. 71-106.
Kepecs, J. and M. Solomon, ``A Simplified Operating System for
Distributed Applications,'' 3rd PODC Conference Proceedings, 1984.
Rovner, P., R. Levin, and J. Wick, ``On Extending Modula-2 For
Building Large, Integrated Systems,'' DEC Systems Research Center,
January 1985 pp. 16-20.
Schwan, K., T. Bihari, B.W. Weide, and G. Taulbee, ``High-Performance
Operating System Primitives for Robotics and Real-Time Control
Systems,'' The Ohio State University Department of Computer and
Information Science, July 1986, report number OSU-CISRC-86TR1KS (to
appear in ACM Transactions on Computer Systems, 1987).
The Performance Implications of Thread Management
Alternatives for Shared-Memory Multiprocessors
by Thomas E. Anderson, Edward D. Lazowska, and
Henry M. Levy.; Department of Computer Science
FR-35, University of Washington, Seattle, WA 98195
Tech Report 88-09-04
(Note: Some of the above do not directly relate to the topic under
discussion)
=============== END of SUMMARY ====================================
The following is an extract of comments/opinions extracted from the
messages posted on os.research and the mail I got. Although I edited
the comments/opinions, I have tried to preserve the context as much
as possible. There may be a few typos.
On the usefulness of Threads.
=============================
>irvin@cwi.nl (Irvin Shizgal) writes:
There are several reasons that light-weight processes are useful
("thread" refers to a light-weight process).
(1) Light-weight processes share an address space, and thus switching
context between them is simpler and faster (the address map doesn't
change. Generally a light-wight context consist of only the program
counter and registers. If non-blocking system calls are available,
they can even be implemented at user level.)
(2) Since light-weight processes share an address space they their
effective communications bandwidth is high. The heavy-weight
process thus encapsulates a unit which is assigned to a single
(possibly multi-processor) machine. On a multi-processor one can
really run threads concurrently.
(3) It is sometimes advantageous to design a program by splitting
it up into concurrent tasks. In many such cases code (and data
structures) can be simplified by letting the operating system do
the multiplexing. If multi-tasking is to be used as a fundamental
programming abstraction, there must be an efficient implementation
>From: peter%sugar@uunet.UU.NET (Peter da Silva)
One thing that occurs to me is that threads basically refine the question
of what is the basic unit of resource allocation by allowing multiple
execution streams to share a block of other resources. Some of these have
been available from the beginning. Open files, for example, though the
prevalence of the standard I/O library has limited the usefulness of this
feature... you may recall that in the version 6 shell this was how control
structures were implemented.
>From: mcvax!cwi.nl!jack@uunet.UU.NET (Jack Jansen)
LWPs are a very nice mechanism to program with, much nicer than
non-blocking I/O with completion interrupts or polling, etc.
Just look at your average unix utility that uses SIGURG or select()
and imagine what it would look like when written as a set of
cooperating LWPs using blocking primitives and you'll get the
point.....
On the origin of Threads,LWPs.
=============================
>irvin@cwi.nl (Irvin Shizgal) writes:
I don't know how far back the notion of light-weight processes
goes. I do remember hearing about it back in the mid-70's in
distributed/portable OS research at the University of Waterloo.
>From: perry@apollo.COM (Jim Perry)
"Threads" or multitasking is a very old technology, a natural
outgrowth of the fact that I/O is too slow to be done synchronously.
In more recent operating systems (well, in Unix, which seems to be
what everyone means by OS) I/O scheduling is done among processes,
i.e. when a process initiates an I/O operation it is suspended until
that operation completes. Other systems allow various mechanisms to
allow the process to continue past the I/O call, with some means of
announcing when pending operations complete. A program using such a
facility is multi-threaded in the simplest sense, and a more elegant
model is easily constructed from such a system.
The DTSS system (Dartmouth TimeSharing System) relies heavily on this
model. The command processor is a single program, which holds open all
user terminals with a task (thread, LWP) to handle each.
A few years ago the OS, PL/I compiler and runtime
system were modified to provide an excellent multitasking model
that's cleaner and more usable than anything I've seen since (to
be fair, if you get to control the OS, the language, and the runtime
system, and are unconcerned about portability, you have a lot more
leeway than e.g. trying to make something that does all this in
portable C).
>From marzullo%cs.cornell.edu
For a historical perspective, look at the paper "On the Duality of
Operating System Structure" by Roger Needham and Hugh Lauer. It is
in the ACM Operating System Review (the SIG newsletter) in the mid
to late seventies. This was an extremely important paper for the time,
as it brought to a head the debate on light vs. heavy weight processes.
Several now reasonably famous people got their start disagreeing with
this paper. Needham and Lauer's argument is superficially correct, but
not correct at a deeper level. The best argument I've heard about it
(Loretta Reid, now at DEC) is that the client-server model is a pervasive
one, and N&L showed how to recognize it and transform a HWP implementation
to a LWP and vice versa.
>From: jrg@Apple.COM (John R. Galloway)
Data General has had multiple tasks (threads) running around in the same
address space at least since the mid 70's. I think such existed in the
early versions of RDOS (just forground and background processes) as well
and that would be in the early 70's.
On Who schedules the LWP for execution: the OS or the parent HWP?
=================================================================
>From: dpm@cs.cmu.edu (David Maynard)
Lightweights are often "scheduled" by code that is part of their
parent--allowing the parent to schedule them intelligently based on known
interaction patterns. The OS scheduler only deals with real processes.
PROBLEM: The classic example is page fault handling. Suppose one of the
lightweight processes faults. As far as the OS is concerned, the entire
process has faulted so that none of the lightweights can proceed until the
page is brought in.
On the problems of using LWPs
=============================
>From: dpm@cs.cmu.edu (David Maynard)
By making the LWPs share an address space, you give up a level of fault
containment and take on the responsibility of coordinating access to shared
resources. This isn't necessarily bad, but you have to keep it in mind.
See above for page fault handling.
>sjs@ctt.bellcore.com (Stan Switzer)
what happens when one lightweight process "blocks" for I/O? Well, one
approach is to write I/O surrogate routines which invoke non-blocking
cognates of "read" and "write", mark the LWP as waiting for I/O
and call the LWP dispatcher. Note that this is still non-preemptive
-- at least with the understanding that all I/O operations potentailly
release control. (This is essentially what the SunOS 4.0 LWP scheme does.)
But suppose that a LWP page faults. This too is effectively a
blocking I/O (and indeed, in MACH, if I understand correctly, users
can provide their own handling for page faults), but things get
complicated here. If we don't block the whole process, then we
effectively have a preemptive scheduler because we can lose control at
any instruction or data fetch. At this point we might as well let the
operating system in on the game and let it schedule our LWPs.
Although the terminalogy is by no means standard, when the OS knows
about your LWPs, they are often called "threads." But now your tasks
have to worry about synchronization amongst themselves using monitors,
semaphores, and suchlike, so in some sense, our gains have been
ill-gotten: we lose some of the convenience we sought to gain (but we
gain concurrency because multiple processors can execute the various
threads).
My feeling is that the OS itself should be nothing more than a
processor and memory multiplexor, leaving everything else up to
user-level code. (It needs to provide some form of IPC too). MACH,
the V system, and AMOEBA (I understand AMOEBA best -- it's quite
elegant) are all based on this basic concept. In this context,
"threads" can also be thought of unbundling processor multiplexing
from memory multiplexing.
In the final analysis, there is some tradeoff among modularity (and
its close relatives reusability and plugability), convenience,
effeciency, safety (esp. regarding shared memory usage), concurrency
(esp in a multi-processor environment), and elegance.
On the Hardware support for threads, LWPs
=========================================
>From: riedl@purdue.edu (John T Riedl)
Most architectures already provide instructions for efficiently saving
processor state to improve procedure call performance. Some machines
(like Suns) provide hardware support for address space context switching,
in the form of multiple hardware address space contexts. This is the
reason for the performance 'knee' on Suns when more than a certain number
of contexts are in the run queue (7 on a 3/50; 15 on a Sun 4, I think).
In one way threads help avoid this knee, since a group of related
asynchronous activities can be organized as a single address space.
On systems that support Threads/LWPs and On hardware support:
============================================================
Both MACH and V support threads. The XTMOS (XTM o.s of Cogent Research,
Salem (I think), Oregon) supports Threads along with the LINDA memory model
on their "desktop supercomputer". The XTM machine provides micro-coded
support for the creation, and scheduling of threads.
From: irvin@cwi.nl (Irvin Shizgal) [AMOEBA project, CWI, Amsterdam]
Other operating systems using threads are Amoeba, "V" (David Cheriton at
Standford), and even OS/2 (the latest Microsoft/IBM release).
From: johnl@ima.isc.com:
There have been many implementations of threads and they have been
reinvented many places. IBM's OS/360 MVT, the ancestor of MVS, had
them in the late 1960's. OS/2 has them.
From: schwan%wayward@gatech.edu (Karsten Schwan)
There are now thread packages on the Encore Multimax (Brown built one)
and the BBN Butterfly running Chrysalis (we built it at Ohio State), and
we are working to add some kernel thread support to the Mach
version of the BBN Butterfly, which initially does not support
threads. In parallel to this work, real-time operating system folks have
realized the importance of low-overhead executable units for
some time...look at some real-time OS papers in ACM TOCS or
in IEEE TC or TSE. Cheers, Karsten Schwan, Assc. Prof.,
School of ICS, Georgia Tech.
From: marzullo@cs.cornell.edu (Keith Marzullo)
DEC SRC has a system is called Topaz, on the Firefly multiprocessor.
Almost every research group using Unix has added a version of threads
to their system:
From: fouts@lemming. (Marty Fouts)
I offer, only half humorously, TSO running under OS/360, as the first
example of "multiple THREADS of control executing within a single
address space"
Consider that TSO ran as a partition and invoked its own round robin
scheduler to schedule its quantum among its various terminal users,
all of which were (sort of) within the same address space. If TSO
doesn't qualify, IBM implementations of APL workspaces might. . .
From: modcomp!nigel@uunet.UU.NET (Nigel Gamble)
The INMOS transputer is a microprocessor which supports creation and
scheduling of processes in hardware. See Byte, November 1988.
From: mason@pescadero.Stanford.EDU
I've not seen a machine that gives you explicit LWP support, but
quick register saves, register windows, and other such mechanisms
can be used to make LWPs quite fast.
On the coexistence of HWPs and LWPs
===================================
Although many LWPs existing in the context of one HWP is to be expected,
it is not uncommon for LWPs and HWPs to coexist without any relationship
between them.
>John Carter (retrac@rice.edu) writes:
In V there is the notion of a "team", which is a HWP (each
team has a different virtual address space) as well as a "process",
which is a LWP. Teams are made up of 1 or more processes.
>karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) writes:
LWPs can exist independent of any other processes. Mine did - they
were an extra process-context level of interrupt processing.
LWPs tend to be used for very
specialized functions (mine were deeply concerned with tty I/O),
whereas typical processes tend to be intended for a whole bunch of
things, which results in heavier usage of comparatively expensive
resources like memory or peripheral I/O.
My LWPs were spawned at boot time by the kernel just after creation of
init, and they then sat waiting for tty events. They were not
attached to any other process in any way. They did not interfere with
normal HWPs in any way.
>Peter da Silva (peter%ficc@uunet.uu.net) writes:
AmigaOS is an example of an environment with no HWPs. Mach is
an example of an environment where LWPs are within HWPs. I believe Minix
is an example of an environment where the kernel contains LWPs but all
user tasks are HWPs. Certainly TUNIS is.
>Larry Campbell (campbell@bsw.com) writes:
I believe that OS/2 and Mach both support threads as well as traditional
processes. You can implement non-preemptive threads yourself pretty
easily under most operating systems if you don't mind writing a bit
of assembler.
Renganathan
renga@cs.uoregon.edu