[comp.os.mach] Threads, Definition of

dmason@msg.uwaterloo.ca (Dave Mason) (02/07/91)

In article <4964@umbc3.UMBC.EDU> cs4213@umbc5.umbc.edu (cs4213) writes:

>   Can anyoneoffer a succinct definition of threads aka lightweight
> processes???  Citations will be met with gratitude, terse, lucid explanations
> with fawning adoration.

As I'm supposed to be working on a paper about this at this very
moment, rather than reading news (-:

Lightweight processes are processes that share an address space.

The implications, ramifications and implementations of this vary
wildly.  Implementations of lwp's on Unix systems vary from Sun lwp's
and the uSystem where several lwp's share a Unix process and the
operating system knows nothing of their existence, through Mach
threads which are almost as lightweight as they can get with the
operating system knowing about them, up to systems where lwp's are
effectively full Unix processes which have mapped a common area of
memory to work with.

	../Dave

cs4213@umbc5.umbc.edu (cs4213) (02/07/91)

	Can anyoneoffer a succinct definition of threads aka lightweight
processes???  Citations will be met with gratitude, terse, lucid explanations
with fawning adoration.

berggren@eecs.cs.pdx.edu (Eric Berggren) (02/08/91)

dmason@msg.uwaterloo.ca (Dave Mason) writes:

>In article <4964@umbc3.UMBC.EDU> cs4213@umbc5.umbc.edu (cs4213) writes:

>>   Can anyoneoffer a succinct definition of threads aka lightweight
>> processes???  Citations will be met with gratitude, terse, lucid explanations
>> with fawning adoration.

>As I'm supposed to be working on a paper about this at this very
>moment, rather than reading news (-:

>Lightweight processes are processes that share an address space.

>The implications, ramifications and implementations of this vary
>wildly.  Implementations of lwp's on Unix systems vary from Sun lwp's
>and the uSystem where several lwp's share a Unix process and the
>operating system knows nothing of their existence, through Mach
>threads which are almost as lightweight as they can get with the
>operating system knowing about them, up to systems where lwp's are
>effectively full Unix processes which have mapped a common area of
>memory to work with.

  How, exactly, does this differ from shared memory processes? 
Thanx.

-e.b.

==============================================================================
  Eric Berggren             |   "Round and round the while() loop goes;
  Computer Science/Eng.     |         Whether it stops," Turing says, 
  berggren@eecs.cs.pdx.edu  |         "nobody knows."

velasco@beowulf.ucsd.edu (Gabriel Velasco) (02/09/91)

berggren@eecs.cs.pdx.edu (Eric Berggren) writes:

>dmason@msg.uwaterloo.ca (Dave Mason) writes:

>>As I'm supposed to be working on a paper about this at this very
>>moment, rather than reading news (-:

>>Lightweight processes are processes that share an address space.

>>The implications, ramifications and implementations of this vary
>>wildly.  Implementations of lwp's on Unix systems vary from Sun lwp's
>>and the uSystem where several lwp's share a Unix process and the
>>operating system knows nothing of their existence, through Mach
>>threads which are almost as lightweight as they can get with the
>>operating system knowing about them, up to systems where lwp's are
  ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>effectively full Unix processes which have mapped a common area of
>>memory to work with.

>  How, exactly, does this differ from shared memory processes? 
>Thanx.

The operating system knows about threads.  This is not necessarily the
case with light-weight processes.  With the first kind of light-weight
processes that Dave mentions above, all of the light-weight processes
have to share the time alloted to the one Unix process within which
they were created.  The Unix process takes care of any scheduling that
needs to be done by dividing up the time that is alloted to it.  The
operating system really needs to know about individual threads of
execution to properly schedule them on a multiprocessor system and to
avoid things like all of the other light-weight processes within a
Unix-like process blocking when one of them does.

-- 
                              ________________________________________________
 <>___,     /             /  | ... and he called out and said, "Gabriel, give |
 /___/ __  / _  __  ' _  /   | this man an understanding of the vision."      |
/\__/\(_/\/__)\/ (_/_(/_/|_  |_______________________________________Dan_8:16_|

mjl@lccma.bos.locus.com (Michael Leibensperger) (02/12/91)

In article <1476@pdxgate.UUCP> berggren@eecs.cs.pdx.edu (Eric Berggren)
asks, how are lightweight process implementations different from regular
processes using shared memory?

As dmason@msg.uwaterloo.ca (Dave Mason) correctly
points out, just what an LWP is depends on who you ask.  The simple
answer to "what is the difference between threads and regular procs
using shared memory?" is that a thread or LWP shares *all* text and
data except for the stack, while regular shared memory processes only
share one or more named regions of memory.  Moreover, LWPs share their
memory *automatically* once the Nth (N>1) LWP is created, while shared
memory procs must perform intricate rituals using shmat(), mmap(), or
whatever before the named regions can be shared.

Some thread/LWP implementations have no built in OS support and rely on 
runtime libraries to manage switching stack segments and register sets,
essentially context switching the process's single thread among various
activities.  The advantages of having the OS know about threads (as in
Mach) are: a) the OS can maintain a single set of page tables for all
threads, reducing the context switch overhead when switching between two
threads of the same task, and b) on an MP system the seperate threads
can execute concurrently.

I await my due share of fawning adoration....  ;-)

	mjl
--
Michael J. Leibensperger <mjl@locus.com>       "None are so deeply enslaved
Locus Computing Corp./Boston			as those who falsely believe
25 Burlington Mall Road				they are free."
Burlington MA 01803, (617)229-4980 x169			-- J. W. von Goethe

dmason@watmsg.uwaterloo.ca (Dave Mason) (02/12/91)

In article <21892@oolong.la.locus.com> mjl@lccma.bos.locus.com (Michael Leibensperger) writes:

> The advantages of having the OS know about threads (as in
> Mach) are: a) the OS can maintain a single set of page tables for all
> threads, reducing the context switch overhead when switching between two
> threads of the same task, and b) on an MP system the seperate threads
> can execute concurrently.

On the other hand, the disadvantage of existing threads system vs.
user-level lightweight processes is that some (most, in some systems)
inter-thread interactions require a system call with an overhead cost
10-1000 times greater than the actual required instructions.

This can be very bad if the LWPs only have a few dozens of
instructions to execute between such interactions.

> I await my due share of fawning adoration....  ;-)

OK: You gave the case for the pro-threads camp very well :-)

	../Dave

jgmorris@CS.CMU.EDU (Greg Morrisett) (02/15/91)

Michael Leibnsperge points out that there are 2 advantages to letting
an OS know about your threads instead of doing it all in the runtime:

  a) the OS can maintain a single set of page tables for all threads,
  reducing context switch overhead when switching between two threads
  of the same task, and b) on an MP system the seperate threads can
  execute concurrently.

If you're handling context switching in the runtime, then the OS still
has only one set of page tables for your threads.  This does advantage hold
over heavy-weight processes that share memory.

There is an additional advantage to letting your OS know about your
threads:  If one thread blocks for I/O, your whole task doesn't have
to be preempted.  Some threads packages that are entirely in the
runtime get around explicit I/O preemption by doing a non-blocking call 
(e.g. select) before doing the blocking call.  But this sort of trick
isn't possible on a page fault.  Note that this advantage applies
to uni-processors as well as MPs.

-Greg Morrisett
 jgmorris@cs.cmu.edu

tim@ohday.sybase.com (Tim Wood) (02/16/91)

In article <JGMORRIS.91Feb14131059@VACHE.VENARI.CS.CMU.EDU> jgmorris@CS.CMU.EDU (Greg Morrisett) writes:
>
>There is an additional advantage to letting your OS know about your
>threads:  If one thread blocks for I/O, your whole task doesn't have
>to be preempted.  Some threads packages that are entirely in the
>runtime get around explicit I/O preemption by doing a non-blocking call 
>(e.g. select) before doing the blocking call.  

You can also use an async I/O facility if it's available.  Most production
operating systems have it (and UNIX systems seem to just be getting it.)

>But this sort of trick
>isn't possible on a page fault.  Note that this advantage applies
>to uni-processors as well as MPs.

Async I/O doesn't help you for a page fault either.  And it's harder to
do SMP with user-mode threads (basically you need LWP on top of HWP).

But async I/O plus user-mode threads plus enough memory to hold your
whole program should be the fastest combination, since the program
(bunch of threads) should never block except voluntarily and context 
switches don't require a system call.  This is (very tersely) how 
Sybase's SMP DBMS is built.
-TW
---
Sybase, Inc. / 6475 Christie Ave. / Emeryville, CA / 94608	  415-596-3500
WORK:tim@sybase.com     {pacbell,pyramid,sun,{uunet,ucbvax}!mtxinu}!sybase!tim
PLAY:axolotl!tim@toad.com		       {sun,uunet}!hoptoad!axolotl!tim
Dis claim er dat claim, what's da difference?  I'm da one doin da talkin' hea.

sritchie@cs.ubc.ca (Stuart Ritchie) (02/20/91)

In article <JGMORRIS.91Feb14131059@VACHE.VENARI.CS.CMU.EDU> jgmorris@CS.CMU.EDU (Greg Morrisett) writes:
>There is an additional advantage to letting your OS know about your
>threads:  If one thread blocks for I/O, your whole task doesn't have
>to be preempted.  Some threads packages that are entirely in the
>runtime get around explicit I/O preemption by doing a non-blocking call 
>(e.g. select) before doing the blocking call.  But this sort of trick
>isn't possible on a page fault.  Note that this advantage applies
>to uni-processors as well as MPs.
>
>-Greg Morrisett
> jgmorris@cs.cmu.edu

This advantage is true, however most I/O calls that I want to do
within a thread will block anyway.  File system calls are an obvious
example.  Under NeXT mach, I can't do a read() without the whole
task blocking.

Seeing as mach currently relies on much Unix code for I/O, the file
system for example, this blocking problem probably won't go away
until the fs code is redesigned specifically for Mach and threads.
I'm guessing at this point...

I saw a comment regarding the efficiency of implementing threads
inside of user space compared to threads supported by the kernel.
The person claimed 10-1000 times greater cost to schedule kernel
supported threads.  I would believe 10 times greater cost, but 1000?
Could someone explain why the overhead is so great?  System call
overhead is one thing, but I don't see how that can account for
everything.  Isn't the microkernel approach supposed to reduce
system call overhead?

I would appreciate any comments regarding performance and other
issues of kernel vs. user-space supported threads.  Specifically,
I plan on implementing something on the NeXT which makes heavy
use of LWP's, namely protocols (X.25, TCP/IP and the OSI stack.)
Thus context switches between LWP's should be as efficient as
possible.

....Stuart
sritchie@cs.ubc.ca

Rick.Rashid@cs.cmu.edu (02/20/91)

A read system call blocks only the thread that makes it.

Current versions of CThreads allow a many to one
mapping of cthreads (which are the constructs Mach
user programs use) and kernel threads (which is the
kernel's notion of a computational state).  For example:
the Mach 3.0 Unix Server has potentially hundreds of
cthreads allocated to manage incoming "Unix" requests
and device messages but only around a dozen actual
kernel supplied threads are used (this number can be
varied depending on the maximum degree of parallelism
desired).  The advantage of this implementation is that
cthreads are often used to encapsulate state rather than
provide for parallelism.  Conditions and mutexs can often
be implemented using a coroutine style transfer of control
between them rather than a full reschedule.   The advantages
of kernel supplied threads (e.g., parallelism, concurrent I/O)
are available along with the advantages of procedure-call
cost thread control transfers for many operations.

af@spice.cs.cmu.edu (Alessandro Forin) (02/21/91)

How comes noone thinks that maybe, just maybe, parallelism is
most appropriately discussed with a multiprocessor machine in mind ?
Fake parallelism does not speedup your programs, you know...

And as far as Mach is concerned, this whole debate is irrelevant:
the latest CThreads implementation lets you use any number of kernel
threads to map any number of Cthreads.  Pick your prey.

As for read(), Encore and OSF/1 (and probably others) have
a fully parallelized file system and networking code in their 2.5-based
kernels.  This is Unix, not Mach.

sandro-

briscoe-duke@cs.yale.edu (Duke Briscoe) (02/21/91)

In article <12032@pt.cs.cmu.edu>, af@spice.cs.cmu.edu (Alessandro Forin) writes:
|> 
|> How comes noone thinks that maybe, just maybe, parallelism is
|> most appropriately discussed with a multiprocessor machine in mind ?
|> Fake parallelism does not speedup your programs, you know...

Actually, I thought threads packages can speed up your programs on a
uniprocessor since processor utilization can be improved by masking
the latency of things such as page faults and I/O.  If you can write a
program with multiple threads, even on a single processor the cpu
doesn't have to be idle if one particular thread blocks.  This assumes
that the scheduler knows about threads, as I think Mach does.

Duke

kevinh@cmi.com (Kevin Hegg) (02/21/91)

> This advantage is true, however most I/O calls that I want to do
> within a thread will block anyway.  File system calls are an obvious
> example.  Under NeXT mach, I can't do a read() without the whole
> task blocking.
>
> Seeing as mach currently relies on much Unix code for I/O, the file
> system for example, this blocking problem probably won't go away
> until the fs code is redesigned specifically for Mach and threads.
> I'm guessing at this point...

At the Jan '91 Usenix conference a paper on the SunOS Multi-threaded 
Architecture was presented. Using a combination of threads and 
light-weight processes Sun provides a mechanism so that your entire task 
(process) doesn't block if a single thread blocks. 

Kevin Hegg, EDS Corp - Center for Machine Intelligence
2001 Commonwealth Blvd., Ann Arbor, Michigan 48105
Phone:   (313) 995-0900  Internet: kevinh@cmi.com    Applelink: D5990

Rick.Rashid@cs.cmu.edu (02/22/91)

Mach has provided this functionality since 1986.  A paper was published
in the Summer 1986 USENIX describing Mach and its use of threads and in
the Summer 1987 USENIX describing both the implementation of threads
and how they interact with a variety of Unix features.

-Rick