[comp.os.research] THREADS, Light weight processes

renga@beaver.cs.washington.edu (Renganathan Sundararajan) (10/27/88)

  I came across the notion of  multiple THREADS of control executing 
  within a single address space, when I was reading about MACH, the
  network operating system developed at CMU. I have also heard about
  Light Weight Processs (LWP) but in what context I don't remember. 

  Questions:

    What exactly is a light weight process? 

    Where did the notion of a LWP originate and in what context? 

    Is it the same as a thread?

    Is the sole purpose of a LWP to help switch contexts fast? 

    If a LWP does not own/use any registers and so saving a context
     involves only saving the PC and condition flags, what is great
     about it? (if a process never uses any registers just so 
     that context-switches can be done fast, is there any advantage
     at all? Am I missing something here?)

    Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? 

    Is the notion of a thread unique to MACH or do other os like V use it?

    Are there any machines out there that provide some kind of hardware
    support to LWP switching?

    Any clarifications/references would be appreciated.

  Renganathan
  email: renga@cs.uoregon.edu

  PS: Please email me your response. If there is enough interest, 
  I will summarise.

mudd-j@bear.cis.ohio-state.edu (John R. Mudd) (10/28/88)

In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes:

>    What exactly is a light weight process? 
>    Is it the same as a thread?

A lightweight process, sometimes called task (Encore's UMAX) or thread (Mach),
is a smaller thread of execution that doesn't have all the capabilities of a 
UNIX process, and as such it is cheaper to create and has less overhead 
associated to it.  As UNIX processes are scheduled on a machine's processor(s),
so are tasks scheduled on a program's available UNIX processes.  A parallel 
program creates a number of UNIX processes on which the program's tasks can 
be created.

>    Where did the notion of a LWP originate and in what context? 

To do efficient (in terms of the OS) parallel processing, it isn't a wise
idea to create a lot of UNIX processes.  UNIX processes tend to have a lot
of 'stuff' in them that isn't necessary for parallel processing.

>    Is the sole purpose of a LWP to help switch contexts fast? 

Not the sole purpose, just one of the convenient purposes.  If you intend
to create a lot of tasks, you want to be able to schedule them on a process
as fast as possible.

>    If a LWP does not own/use any registers and so saving a context
>     involves only saving the PC and condition flags, what is great
>     about it? 

Speed.  See above.  

>    Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? 

LWPs exist in a HWP, which are then scheduled by the OS.  See above.

>  Is the notion of a thread unique to MACH or do other os like V use it?

Nope.  I don't know about V, but Encore's UMAX uses the concept of 'tasks.'
>From what I've read on Sequent's DYNIX, they seem to simply run straight UNIX
processes.  I wouldn't want to say for sure, and I don't know about any other
systems.  Anybody else care to add to this?

>    Are there any machines out there that provide some kind of hardware
>    support to LWP switching?

Unknown.

>    Any clarifications/references would be appreciated.
>
>  Renganathan

I made a stab at answering your questions; if anybody has anything 
additional on this topic, I'd like to see it too.  (That means POST, not
email)
                                              John
-------------------------------------------------------------------------------
 John R. Mudd                     Department of Computer & Information Science
 The Ohio State University, 2036 Neil Avenue, Columbus, Ohio, USA   43210-1277
 email: mudd-j@cis.ohio-state.edu

riedl@purdue.edu (John T Riedl) (10/28/88)

'Thread' is another (shorter!) name for a lightweight process.
Threads improve efficiency on many architectures by recognizing the
distinction between address space operations and thread-of-control
operations.  The efficiency of three basic operations is improved:
creation, deletion, and context switch.

Address space creation/deletion is expensive, since a significant
amount of state is preserved for each address space.  Secondary
storage access may be required on swapping or paging systems.  Thread
creation and deletion is cheap.  A new entry is added to the thread
table in the address space in which the thread will run.  Basically,
the state information in this entry is a copy of the contents of the
processor registers for the thread.  Removing such an entry is even
simpler.

Address space context switch may be expensive, especially on virtual
memory machines.  Processor caches must be flushed, a process may have
to be swapped from secondary storage, etc.  Thread context switch
within an address space is trivial.  The processor simply saves a copy
of its registers in the thread table, and loads a new copy.  In many
ways thread context switch is more like a procedure call than an
ordinary context switch.  Note that to take advantage of these more
efficient context switches, systems that use threads must have a
two-level scheduling algorithm that minimizes switches between address
spaces.

In article  <5287@saturn.ucsc.edu> mudd-j@bear.cis.ohio-state.edu (John R. Mudd) writes:
>
>In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes:
>
>>    If a LWP does not own/use any registers and so saving a context
>>     involves only saving the PC and condition flags, what is great
>>     about it? 
>
>Speed.  See above.  

I think this is a misunderstanding.  Threads must save the PC,
condition flags, *and general registers*.  The speed comes from
avoiding avoiding the checkpointing of address space data (page
tables, etc.).

>>  Is the notion of a thread unique to MACH or do other os like V use it?

The V System from Stanford uses threads.  I think their nomenclature
is 'task' for address space and 'process' for thread.

>>    Are there any machines out there that provide some kind of hardware
>>    support to LWP switching?

What sort of hardware support is needed?  Most architectures already
provide instructions for efficiently saving processor state to improve
procedure call performance.  Some machines (like Suns) provide
hardware support for address space context switching, in the form of
multiple hardware address space contexts.  This is the reason for the
performance 'knee' on Suns when more than a certain number of contexts
are in the run queue (7 on a 3/50; 15 on a Sun 4, I think).  In one
way threads help avoid this knee, since a group of related
asynchronous activities can be organized as a single address space.

Here are references to Mach and V, both of which have threads:

@Article{mach-overview,
	Author = "Richard F. Rashid",
	Title  = "Threads of a New System",
	Journal= "Unix Review",
	Number = 8,
	Volume = 4,
	Month  = aug,
	Year   = 1986,
	Pages  = "37--49"
}

@Article{V-process-groups,
	Author = "David. R. Cheriton and W. Zwaenepoel",
	Title  = "Distributed Process Groups in the {V} Kernel",
	Journal= TOCS,
	Volume = 3,
	Number = 2,
	Month  = may, 
	Year   = 1985,
	Pages  = "77--107"
}

John
-- 
John Riedl
{ucbvax,decvax,hplabs}!purdue!riedl  -or-  riedl@cs.purdue.edu

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (10/29/88)

mudd-j@bear.cis.ohio-state.edu (John R. Mudd) writes:
   > What exactly is a light weight process?
   > Is it the same as a thread?

   A lightweight process, sometimes called task (Encore's UMAX) or
   a thread (Mach), is a smaller thread of execution that doesn't have
   all the capabilities of a UNIX process, and as such it is cheaper
   to create and has less overhead associated to it.  As UNIX
   processes are scheduled on a machine's processor(s), so are tasks
   scheduled on a program's available UNIX processes.  A parallel
   program creates a number of UNIX processes on which the program's
   tasks can be created.

Don't get caught in the trap of considering LWPs only in the context
of UNIX.  There are a number of implementations of LWP concepts in
other OSes, some of them not at all UNIX-like.

   > Where did the notion of a LWP originate and in what context?

   To do efficient (in terms of the OS) parallel processing, it isn't
   a wise idea to create a lot of UNIX processes.  UNIX processes tend
   to have a lot of 'stuff' in them that isn't necessary for parallel
   processing.

They're also useful for other things, e.g., fast, efficient
communication among a set of cooperating processes where the processes
aren't necessarily `parallel' in nature.

   > Is the sole purpose of a LWP to help switch contexts fast?

There's lots of potential uses for them.  LWPs generally don't carry
the heavy baggage around with them that a normal ("heavy") process
does.  For example, it may not claim a distinct data space of its own,
it may borrow a register set from a common parent (typically a HWP
parent), it may have a reduced concept of a process control block
requiring less intervention and mangement by the kernel on its behalf,
that sort of thing.  That's why they're "light."

Insofar as their use for speedy, low-overhead operations goes, a lot
of certain kernel's internals are not well-tuned for LWP usage.  I had
problems with the foolishness of UNIX' wakeup() wanting to search the
whole proc[] table - that cost me CPU time in a big way.

   > If a LWP does not own/use any registers and so saving a context
   > involves only saving the PC and condition flags, what is great
   > about it?

I would say it's the combination of minimal resource usage, in
relation to how it affects speed, as John suggested.  My grad work was
done in LWPs which were compiled as part of a SysV kernel, hence
claimed no data space of their own, and could be switched into context
instantly without ever having to worry about such delaying problems as
swapping and the like.

   > Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS?

   LWPs exist in a HWP, which are then scheduled by the OS.  See above.

Oops, don't overstate the case, John.  LWPs can exist independent of
any other processes.  Mine did - they were an extra process-context
level of interrupt processing.  LWPs tend to be used for very
specialized functions (mine were deeply concerned with tty I/O),
whereas typical processes tend to be intended for a whole bunch of
things, which results in heavier usage of comparatively expensive
resources like memory or peripheral I/O.

My LWPs were spawned at boot time by the kernel just after creation of
init, and they then sat waiting for tty events.  They were not
attached to any other process in any way.  They did not interfere with
normal HWPs in any way.

   > Is the notion of a thread unique to MACH or do other os like V use it?

Not at all unique to MACH.  Take, for example, Kepecs & Solomon's
message passing kernel (see ref below).

   > Are there any machines out there that provide some kind of hardware
   > support to LWP switching?

I couldn't say.  I'm not aware of any.

   > Any clarifications/references would be appreciated.

Some of these are closely relevant, some are not.  This is from the
bibliography to my MS thesis, not quite two years old.

Abu-Sufah, W., H.E. Husmann, and D.J. Kuck, ``On Input/Output Speedup
in Tightly Coupled Multiprocessors,'' IEEE Transactions on Computers,
June 1986, Volume C-35, pp. 520-30.

Baron, R.V., D. Black, W. Bolosky, J. Chew, D.B. Golub, R. Rashid, A.
Tevanian, Jr., and M.W. Young, ``MACH Kernel Interface Manual,''
Carnegie-Mellon University Department of Computer Science, October
1986.

Black, A.P., ``Supporting Distributed Applications: Experience with
Eden,'' Proceedings of the Tenth ACM Symposium on Operating System
Principles, December 1985.

Brooks, E.D. III, ``A Multitasking Kernel for the C and FORTRAN
Programming Languages,'' Lawrence Livermore Laboratory, University of
California, September 1984.

Clark, D.D.  ``The Structuring of Systems Using UpCalls,'' Proceedings
of the Tenth ACM Symposium on Operating System Principles, December
1985, pp. 171-80.

Jones, M.B., R.F. Rashid, and M.R. Thompson, ``Matchmaker: An
Interface Specification Language for Distributed Processing,''
Proceedings of the 12th Annual Symposium on Principles of Programming
Languages, January 1985, pp. 225-35.

Kameda, H., ``Effects of Job Loading Policies for Multiprogramming
Systems in Processing a Job Stream,'' ACM Transactions on Computer
Systems, Volume 4, February 1986, pp. 71-106.

Kepecs, J. and M. Solomon, ``A Simplified Operating System for
Distributed Applications,'' 3rd PODC Conference Proceedings, 1984.

Rovner, P., R. Levin, and J. Wick, ``On Extending Modula-2 For
Building Large, Integrated Systems,'' DEC Systems Research Center,
January 1985 pp. 16-20.

Schwan, K., T. Bihari, B.W. Weide, and G. Taulbee, ``High-Performance
Operating System Primitives for Robotics and Real-Time Control
Systems,'' The Ohio State University Department of Computer and
Information Science, July 1986, report number OSU-CISRC-86TR1KS (to
appear in ACM Transactions on Computer Systems, 1987).

Have a blast,
--Karl

steve@basser.oz (Stephen Russell) (10/31/88)

In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes:
>
>  I came across the notion of  multiple THREADS of control executing 
>  within a single address space [...]
>  Questions:
>
>    What exactly is a light weight process? 

In general, threads == LWPs.

>    Where did the notion of a LWP originate and in what context? 

Earliest ref I have is Cheriton's Thoth, which had multiple processes
executing in same address space (a team). There were probably earlier examples.

>    Is the sole purpose of a LWP to help switch contexts fast? 
>    If a LWP does not own/use any registers and so saving a context
>     involves only saving the PC and condition flags, what is great
>     about it? (if a process never uses any registers just so 
>     that context-switches can be done fast, is there any advantage
>     at all? Am I missing something here?)

Context switches involve more than just registers. Consider cost of switching
address spaces, which may involve changing many segment/page address registers.
Switching between LWP's in the same space saves this cost. On systems which use
virtual caching (eg, 68020), it is usually necessary to flush the I & D caches
after a full context switch. Again, this is unnecessary if switching between
LWP's in the same space.

There are other advantages, though. Firstly, LWP's usually share data, unlike
most UNIX processes. The cost of creating a LWP is much lower, because no
memory allocation need be done - this can be made particularly cheap if the
creator provides the storage for the child processes stack. So, a LWP can
be created as fast as you can grab a free entry from the proc table, and
set up the PC and SP for the new proc.

Also, LWPs are a useful way of structuring the OS kernel. For example,
the Thoth kernel consisted of a group of LWPs, each managing some
resource. For further details, see Cheriton's book "The Thoth System:
Multi-process Structuring and Portability", Elsevier, 1982.

>    Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS?

Yep. LWP's execute within the address space of a HWP (a Thoth "team"). A HWP
is considered a single unit from the point of view of swapping/paging, memory
allocation, etc, and may also be the place that info about open files, etc,
is kept (assuming LWPs share files). You create new HWPs whenever you need
a new address space; eg, to execute a new program.

>    Is the notion of a thread unique to MACH or do other os like V use it?

Nope. Thoth and its derivatives (Port, Hermes, V) have them. In fact, most
modern OS's do (ie, those that aren't just UNIX clones :-).

>    Are there any machines out there that provide some kind of hardware
>    support to LWP switching?

Not really needed, as LWP switching involves _just_ switching the registers.
We can even make this cheaper by refusing to preserve registers across system
calls, so that all we need to save is the PC/SP (and possibly a FP). Of course,
if a process is suspended as a result of an interrupt, we must save all its
registers, and restore them later during the interrupt return sequence. However,
voluntary suspensions (eg I/O requests) cost very little.

LWPs are a nice way of structuring systems, not just for parallel programming,
but OS's in general. See Cheriton's stuff - it is very good.

nigel@uunet.UU.NET (Nigel Gamble) (10/31/88)

in article <5279@saturn.ucsc.edu>, rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) says:
>     Are there any machines out there that provide some kind of hardware
>     support to LWP switching?

The INMOS transputer is a microprocessor which supports creation and
scheduling of processes in hardware.  See Byte, November 1988.

-- 
Nigel Gamble            "Everything should be made as simple as possible,
MODCOMP/AEG              but not simpler."  Albert Einstein.
uunet!modcomp!nigel

pardo@june.cs.washington.edu (11/01/88)

In the meanwhile, here's yet another reference:

	The Performance Implications of Thread Management
	Alternatives for Shared-Memory Multiprocessors
	by Thomas E. Anderson, Edward D. Lazowska, and
	Henry M. Levy.; Department of Computer Science
	FR-35, University of Washington, Seattle, WA 98195
	Tech Reprot 88-09-04

The title should tell all.  The paper contains both analytic models
and experimental results.

jack@uunet.UU.NET (Jack Jansen) (11/03/88)

So far nobody seems to have mentioned that LWPs are a very nice
mechanism to program with, much nicer than non-blocking I/O with
completion interrupts or polling, etc.

Just look at your average unix utility that uses SIGURG or select()
and imagine what it would look like when written as a set of
cooperating LWPs using blocking primitives and you'll get the
point.....

--
Fight war, not wars			| Jack Jansen, jack@cwi.nl
Destroy power, not people! -- Crass	| (or mcvax!jack)

libes@cme.nbs.gov (Don Libes) (11/07/88)

In article <5354@saturn.ucsc.edu> mcvax!cwi.nl!jack@uunet.UU.NET (Jack Jansen) writes:
>So far nobody seems to have mentioned that LWPs are a very nice
>mechanism to program with, much nicer than non-blocking I/O with
>completion interrupts or polling, etc.

I agree completely.  I recently wrote an article discussing exactly
this.  It showed the difficulty of writing a conceptually simple
program, and how it resolved cleanly into 3 threads, just a couple of
lines a piece.  It also discussed threads with respect to OS/2 and UNIX.

	"Unraveling Threads", Micro/Systems Journal, p. 12, Nov. 1988.

Another reference you might consider (also by me) is:

	"Multiple Programs in One UNIX Process", ;login, July/August 1987.

This paper described an easy way to do LWPs in UNIX.  The idea was
that (the conceptually elegant) XINU was a LWP scheduler and that
porting it to UNIX was easy.  It required no modifications to the
kernel.  Only 6 lines of assembler to save and restore registers.

Don Libes          libes@cme.nbs.gov      ...!uunet!cme-durer!libes

edler@cmcl2.NYU.EDU (Jan Edler) (11/15/88)

This topic is a few weeks old, but a mail/news problem here delayed my
followup.  I agree in general with most that has been said on this topic,
mostly that there are many different things going around under the same
or similar names, such as "threads" and "lightweight processes".  Which one
you want depends on your needs, goals, and environment.  There are a couple
of other comments I would like to make as well.

First, I want to respond to one opinion expressed by mcvax!jack (Jack Jansen)
in article <5354@saturn.ucsc.edu>:
> So far nobody seems to have mentioned that LWPs are a very nice
> mechanism to program with, much nicer than non-blocking I/O with
> completion interrupts or polling, etc.

> Just look at your average unix utility that uses SIGURG or select()
> and imagine what it would look like when written as a set of
> cooperating LWPs using blocking primitives and you'll get the
> point.....

This was also agreed to by libes@cme.nbs.gov (Don Libes) in article
<5394@saturn.ucsc.edu>:
> I agree completely.  I recently wrote an article discussing exactly
> this.  It showed the difficulty of writing a conceptually simple
> program, and how it resolved cleanly into 3 threads, just a couple of
> lines a piece.  It also discussed threads with respect to OS/2 and UNIX.

Anyway, I have to say I disagree.  There are two issues here, syntactic
and conceptual.

Things like constructing control blocks, checking for error conditions,
etc., can be a real hassle in asynchronous I/O systems, but LWP systems
can have just as much crap to deal with (e.g. stack space allocation,
etc.).  It just depends on how low-level the programmers' interface is,
and how fancy the facility is.  There is no reason not to provide an
easy-to-use asynchronous I/O interface.

Conceptually, both asynchronous I/O systems and LWP systems are sources
of complexity.  Both require you to deal with language/compiler issues
having to do with asynchronous access to shared variables (at least
something like volatile in ansi c, and possibly much more.  Or a dumb
compiler).  Race conditions can arise in either kind of system, and can
be equally hard to debug.  But a LWP mechanism is fundamentally more
powerful; basically I'm talking about the difference between interrupts
on the one hand, and MIMD shared memory multiprocessing on the other.
Clearly interrupts are not as powerful as multiprocessing, but their
inherent structure and relative simplicity can make them somewhat more
appropriate when that full power isn't needed.  Of course, there may be
other reasons why you want multiprocessing anyway; that's fine too.

The other point I would like to address is whether kernel support is
needed for LWPs.  There are a couple of reasons usually given for why
kernel support is needed: you don't want a HWP suspended when a LWP
(running "on" that HWP) does I/O, or incurrs a page fault.  Both these
problems can be addressed with interrupts (signals), as an alternative
to kernel-supported LWPs.  The use of asynchronous I/O in the
implementation of the user-level LWP system is fairly straightforward:
since the I/O is asynchronous, the HWP doesn't block, but can switch to
a new LWP.  Page faults can be handled similar to the way real
operating systems handle them: generate an interrupt to a piece of
memory-resident code that deals with the situation.  In this case and
all that is necessary is to switch to a new LWP.  When the page fault
has been satisfied, another interrupt (analogous to the I/O completion
interrupt generated by the hardware when the page-in is done) can cause
the blocked LWP to be made ready again.  Of course, there may be other
reasons for choosing kernel-supported LWPs, but I don't think I/O and
page fault handling are good arguments in favor of doing it that way.

Jan Edler
NYU Ultracomputer Project
edler@nyu.edu
cmcl2!edler

raveling@vaxb.isi.edu (Paul Raveling) (11/19/88)

In article <5467@saturn.ucsc.edu> edler@cmcl2.NYU.EDU (Jan Edler) writes:
>
>First, I want to respond to one opinion expressed by mcvax!jack (Jack Jansen)
>in article <5354@saturn.ucsc.edu>:
>> So far nobody seems to have mentioned that LWPs are a very nice
>> mechanism to program with, much nicer than non-blocking I/O with
>> completion interrupts or polling, etc.
>
>This was also agreed to by libes@cme.nbs.gov (Don Libes) in article
><5394@saturn.ucsc.edu>:
>> I agree completely.  I recently wrote an article discussing exactly
>> this.  It showed the difficulty of writing a conceptually simple
>> program, and how it resolved cleanly into 3 threads, just a couple of
>> lines a piece.  It also discussed threads with respect to OS/2 and UNIX.
>
>Anyway, I have to say I disagree.  There are two issues here, syntactic
>and conceptual.

	On conceptual grounds I tend to agree rather than disagree.
	A virtue of processes, no matter what their weight, is
	encapsulation of logic.

	This should sound a bit like object-oriented programming.
	It turns out all that "process oriented programming" we
	did in the '70's had a lot in common with OO techniques.

>Things like constructing control blocks, checking for error conditions,
>etc., can be a real hassle in asynchronous I/O systems, ...

	I/O systems are easiest with a process-oriented approach.
	A process per device in the kernel often allows using the
	process state instead of state data saved in control
	blocks.  This also tends to radically decrease the amount of
	"critical region" code that needs exclusive access to
	shared control blocks, device interface registers, etc.

	Good interprocess communication simplifies the interface
	between client processes and i/o [device driver] processes.
	In a system such as EPOS the kernel's mechanism for queueing
	messages as event data eliminated the need for separate
	i/o scheduling queues.
	
	As the application sees it, asynchronous and synchronous i/o
	are nearly identical.  For asynchronous i/o, it supplies an
	event identier code for the kernel to use in the message
	reporting i/o completion; when the application eventually
	waits for or checks for that event, it gets the same event
	data as for a synchronous request (info such as completion
	status and number of bytes transferred).


>...  There is no reason not to provide an
>easy-to-use asynchronous I/O interface.

	You bet!

>Conceptually, both asynchronous I/O systems and LWP systems are sources
>of complexity.  Both require you to deal with language/compiler issues
>having to do with asynchronous access to shared variables ...

	I think access to shared variables is the only thing that
	may need to involve the compiler.  If the hardware guarantees
	as little as uninterrupted increment-and-test and decrement-
	and-test operations, even that can become almost trivial.

	Questions of blocking get to be potentially painful with
	LWP's.  A fairly clean answer is to use "HWP's", but be sure
	it's easy for them to communicate via messages and to share
	portions of an address space.

	Of course VERY fine-grained LWP's or short threads would
	need more language help, but practically all of the situations
	I've encountered that beg for multiple processes or threads
	have been more appropriate for "heavies".  The main virtue
	of LWP's on UNIX systems is to allow fast interprocess
	communication, fast context switching, and sharing an address
	space.  It's not that tough to provide the same facilities
	to "HWP"'s, but it takes a different kernel.


---------------------
Paul Raveling
Raveling@vaxb.isi.edu