renga@beaver.cs.washington.edu (Renganathan Sundararajan) (10/27/88)
I came across the notion of multiple THREADS of control executing within a single address space, when I was reading about MACH, the network operating system developed at CMU. I have also heard about Light Weight Processs (LWP) but in what context I don't remember. Questions: What exactly is a light weight process? Where did the notion of a LWP originate and in what context? Is it the same as a thread? Is the sole purpose of a LWP to help switch contexts fast? If a LWP does not own/use any registers and so saving a context involves only saving the PC and condition flags, what is great about it? (if a process never uses any registers just so that context-switches can be done fast, is there any advantage at all? Am I missing something here?) Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? Is the notion of a thread unique to MACH or do other os like V use it? Are there any machines out there that provide some kind of hardware support to LWP switching? Any clarifications/references would be appreciated. Renganathan email: renga@cs.uoregon.edu PS: Please email me your response. If there is enough interest, I will summarise.
mudd-j@bear.cis.ohio-state.edu (John R. Mudd) (10/28/88)
In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes: > What exactly is a light weight process? > Is it the same as a thread? A lightweight process, sometimes called task (Encore's UMAX) or thread (Mach), is a smaller thread of execution that doesn't have all the capabilities of a UNIX process, and as such it is cheaper to create and has less overhead associated to it. As UNIX processes are scheduled on a machine's processor(s), so are tasks scheduled on a program's available UNIX processes. A parallel program creates a number of UNIX processes on which the program's tasks can be created. > Where did the notion of a LWP originate and in what context? To do efficient (in terms of the OS) parallel processing, it isn't a wise idea to create a lot of UNIX processes. UNIX processes tend to have a lot of 'stuff' in them that isn't necessary for parallel processing. > Is the sole purpose of a LWP to help switch contexts fast? Not the sole purpose, just one of the convenient purposes. If you intend to create a lot of tasks, you want to be able to schedule them on a process as fast as possible. > If a LWP does not own/use any registers and so saving a context > involves only saving the PC and condition flags, what is great > about it? Speed. See above. > Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? LWPs exist in a HWP, which are then scheduled by the OS. See above. > Is the notion of a thread unique to MACH or do other os like V use it? Nope. I don't know about V, but Encore's UMAX uses the concept of 'tasks.' >From what I've read on Sequent's DYNIX, they seem to simply run straight UNIX processes. I wouldn't want to say for sure, and I don't know about any other systems. Anybody else care to add to this? > Are there any machines out there that provide some kind of hardware > support to LWP switching? Unknown. > Any clarifications/references would be appreciated. > > Renganathan I made a stab at answering your questions; if anybody has anything additional on this topic, I'd like to see it too. (That means POST, not email) John ------------------------------------------------------------------------------- John R. Mudd Department of Computer & Information Science The Ohio State University, 2036 Neil Avenue, Columbus, Ohio, USA 43210-1277 email: mudd-j@cis.ohio-state.edu
riedl@purdue.edu (John T Riedl) (10/28/88)
'Thread' is another (shorter!) name for a lightweight process. Threads improve efficiency on many architectures by recognizing the distinction between address space operations and thread-of-control operations. The efficiency of three basic operations is improved: creation, deletion, and context switch. Address space creation/deletion is expensive, since a significant amount of state is preserved for each address space. Secondary storage access may be required on swapping or paging systems. Thread creation and deletion is cheap. A new entry is added to the thread table in the address space in which the thread will run. Basically, the state information in this entry is a copy of the contents of the processor registers for the thread. Removing such an entry is even simpler. Address space context switch may be expensive, especially on virtual memory machines. Processor caches must be flushed, a process may have to be swapped from secondary storage, etc. Thread context switch within an address space is trivial. The processor simply saves a copy of its registers in the thread table, and loads a new copy. In many ways thread context switch is more like a procedure call than an ordinary context switch. Note that to take advantage of these more efficient context switches, systems that use threads must have a two-level scheduling algorithm that minimizes switches between address spaces. In article <5287@saturn.ucsc.edu> mudd-j@bear.cis.ohio-state.edu (John R. Mudd) writes: > >In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes: > >> If a LWP does not own/use any registers and so saving a context >> involves only saving the PC and condition flags, what is great >> about it? > >Speed. See above. I think this is a misunderstanding. Threads must save the PC, condition flags, *and general registers*. The speed comes from avoiding avoiding the checkpointing of address space data (page tables, etc.). >> Is the notion of a thread unique to MACH or do other os like V use it? The V System from Stanford uses threads. I think their nomenclature is 'task' for address space and 'process' for thread. >> Are there any machines out there that provide some kind of hardware >> support to LWP switching? What sort of hardware support is needed? Most architectures already provide instructions for efficiently saving processor state to improve procedure call performance. Some machines (like Suns) provide hardware support for address space context switching, in the form of multiple hardware address space contexts. This is the reason for the performance 'knee' on Suns when more than a certain number of contexts are in the run queue (7 on a 3/50; 15 on a Sun 4, I think). In one way threads help avoid this knee, since a group of related asynchronous activities can be organized as a single address space. Here are references to Mach and V, both of which have threads: @Article{mach-overview, Author = "Richard F. Rashid", Title = "Threads of a New System", Journal= "Unix Review", Number = 8, Volume = 4, Month = aug, Year = 1986, Pages = "37--49" } @Article{V-process-groups, Author = "David. R. Cheriton and W. Zwaenepoel", Title = "Distributed Process Groups in the {V} Kernel", Journal= TOCS, Volume = 3, Number = 2, Month = may, Year = 1985, Pages = "77--107" } John -- John Riedl {ucbvax,decvax,hplabs}!purdue!riedl -or- riedl@cs.purdue.edu
karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (10/29/88)
mudd-j@bear.cis.ohio-state.edu (John R. Mudd) writes: > What exactly is a light weight process? > Is it the same as a thread? A lightweight process, sometimes called task (Encore's UMAX) or a thread (Mach), is a smaller thread of execution that doesn't have all the capabilities of a UNIX process, and as such it is cheaper to create and has less overhead associated to it. As UNIX processes are scheduled on a machine's processor(s), so are tasks scheduled on a program's available UNIX processes. A parallel program creates a number of UNIX processes on which the program's tasks can be created. Don't get caught in the trap of considering LWPs only in the context of UNIX. There are a number of implementations of LWP concepts in other OSes, some of them not at all UNIX-like. > Where did the notion of a LWP originate and in what context? To do efficient (in terms of the OS) parallel processing, it isn't a wise idea to create a lot of UNIX processes. UNIX processes tend to have a lot of 'stuff' in them that isn't necessary for parallel processing. They're also useful for other things, e.g., fast, efficient communication among a set of cooperating processes where the processes aren't necessarily `parallel' in nature. > Is the sole purpose of a LWP to help switch contexts fast? There's lots of potential uses for them. LWPs generally don't carry the heavy baggage around with them that a normal ("heavy") process does. For example, it may not claim a distinct data space of its own, it may borrow a register set from a common parent (typically a HWP parent), it may have a reduced concept of a process control block requiring less intervention and mangement by the kernel on its behalf, that sort of thing. That's why they're "light." Insofar as their use for speedy, low-overhead operations goes, a lot of certain kernel's internals are not well-tuned for LWP usage. I had problems with the foolishness of UNIX' wakeup() wanting to search the whole proc[] table - that cost me CPU time in a big way. > If a LWP does not own/use any registers and so saving a context > involves only saving the PC and condition flags, what is great > about it? I would say it's the combination of minimal resource usage, in relation to how it affects speed, as John suggested. My grad work was done in LWPs which were compiled as part of a SysV kernel, hence claimed no data space of their own, and could be switched into context instantly without ever having to worry about such delaying problems as swapping and the like. > Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? LWPs exist in a HWP, which are then scheduled by the OS. See above. Oops, don't overstate the case, John. LWPs can exist independent of any other processes. Mine did - they were an extra process-context level of interrupt processing. LWPs tend to be used for very specialized functions (mine were deeply concerned with tty I/O), whereas typical processes tend to be intended for a whole bunch of things, which results in heavier usage of comparatively expensive resources like memory or peripheral I/O. My LWPs were spawned at boot time by the kernel just after creation of init, and they then sat waiting for tty events. They were not attached to any other process in any way. They did not interfere with normal HWPs in any way. > Is the notion of a thread unique to MACH or do other os like V use it? Not at all unique to MACH. Take, for example, Kepecs & Solomon's message passing kernel (see ref below). > Are there any machines out there that provide some kind of hardware > support to LWP switching? I couldn't say. I'm not aware of any. > Any clarifications/references would be appreciated. Some of these are closely relevant, some are not. This is from the bibliography to my MS thesis, not quite two years old. Abu-Sufah, W., H.E. Husmann, and D.J. Kuck, ``On Input/Output Speedup in Tightly Coupled Multiprocessors,'' IEEE Transactions on Computers, June 1986, Volume C-35, pp. 520-30. Baron, R.V., D. Black, W. Bolosky, J. Chew, D.B. Golub, R. Rashid, A. Tevanian, Jr., and M.W. Young, ``MACH Kernel Interface Manual,'' Carnegie-Mellon University Department of Computer Science, October 1986. Black, A.P., ``Supporting Distributed Applications: Experience with Eden,'' Proceedings of the Tenth ACM Symposium on Operating System Principles, December 1985. Brooks, E.D. III, ``A Multitasking Kernel for the C and FORTRAN Programming Languages,'' Lawrence Livermore Laboratory, University of California, September 1984. Clark, D.D. ``The Structuring of Systems Using UpCalls,'' Proceedings of the Tenth ACM Symposium on Operating System Principles, December 1985, pp. 171-80. Jones, M.B., R.F. Rashid, and M.R. Thompson, ``Matchmaker: An Interface Specification Language for Distributed Processing,'' Proceedings of the 12th Annual Symposium on Principles of Programming Languages, January 1985, pp. 225-35. Kameda, H., ``Effects of Job Loading Policies for Multiprogramming Systems in Processing a Job Stream,'' ACM Transactions on Computer Systems, Volume 4, February 1986, pp. 71-106. Kepecs, J. and M. Solomon, ``A Simplified Operating System for Distributed Applications,'' 3rd PODC Conference Proceedings, 1984. Rovner, P., R. Levin, and J. Wick, ``On Extending Modula-2 For Building Large, Integrated Systems,'' DEC Systems Research Center, January 1985 pp. 16-20. Schwan, K., T. Bihari, B.W. Weide, and G. Taulbee, ``High-Performance Operating System Primitives for Robotics and Real-Time Control Systems,'' The Ohio State University Department of Computer and Information Science, July 1986, report number OSU-CISRC-86TR1KS (to appear in ACM Transactions on Computer Systems, 1987). Have a blast, --Karl
steve@basser.oz (Stephen Russell) (10/31/88)
In article <5279@saturn.ucsc.edu> rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) writes: > > I came across the notion of multiple THREADS of control executing > within a single address space [...] > Questions: > > What exactly is a light weight process? In general, threads == LWPs. > Where did the notion of a LWP originate and in what context? Earliest ref I have is Cheriton's Thoth, which had multiple processes executing in same address space (a team). There were probably earlier examples. > Is the sole purpose of a LWP to help switch contexts fast? > If a LWP does not own/use any registers and so saving a context > involves only saving the PC and condition flags, what is great > about it? (if a process never uses any registers just so > that context-switches can be done fast, is there any advantage > at all? Am I missing something here?) Context switches involve more than just registers. Consider cost of switching address spaces, which may involve changing many segment/page address registers. Switching between LWP's in the same space saves this cost. On systems which use virtual caching (eg, 68020), it is usually necessary to flush the I & D caches after a full context switch. Again, this is unnecessary if switching between LWP's in the same space. There are other advantages, though. Firstly, LWP's usually share data, unlike most UNIX processes. The cost of creating a LWP is much lower, because no memory allocation need be done - this can be made particularly cheap if the creator provides the storage for the child processes stack. So, a LWP can be created as fast as you can grab a free entry from the proc table, and set up the PC and SP for the new proc. Also, LWPs are a useful way of structuring the OS kernel. For example, the Thoth kernel consisted of a group of LWPs, each managing some resource. For further details, see Cheriton's book "The Thoth System: Multi-process Structuring and Portability", Elsevier, 1982. > Do LWPs and HWPs (H = Heavy - whatever that may mean) co-exist in an OS? Yep. LWP's execute within the address space of a HWP (a Thoth "team"). A HWP is considered a single unit from the point of view of swapping/paging, memory allocation, etc, and may also be the place that info about open files, etc, is kept (assuming LWPs share files). You create new HWPs whenever you need a new address space; eg, to execute a new program. > Is the notion of a thread unique to MACH or do other os like V use it? Nope. Thoth and its derivatives (Port, Hermes, V) have them. In fact, most modern OS's do (ie, those that aren't just UNIX clones :-). > Are there any machines out there that provide some kind of hardware > support to LWP switching? Not really needed, as LWP switching involves _just_ switching the registers. We can even make this cheaper by refusing to preserve registers across system calls, so that all we need to save is the PC/SP (and possibly a FP). Of course, if a process is suspended as a result of an interrupt, we must save all its registers, and restore them later during the interrupt return sequence. However, voluntary suspensions (eg I/O requests) cost very little. LWPs are a nice way of structuring systems, not just for parallel programming, but OS's in general. See Cheriton's stuff - it is very good.
nigel@uunet.UU.NET (Nigel Gamble) (10/31/88)
in article <5279@saturn.ucsc.edu>, rutgers!mist.math.uoregon.edu!renga@beaver.cs.washington.edu (Renganathan Sundararajan) says: > Are there any machines out there that provide some kind of hardware > support to LWP switching? The INMOS transputer is a microprocessor which supports creation and scheduling of processes in hardware. See Byte, November 1988. -- Nigel Gamble "Everything should be made as simple as possible, MODCOMP/AEG but not simpler." Albert Einstein. uunet!modcomp!nigel
pardo@june.cs.washington.edu (11/01/88)
In the meanwhile, here's yet another reference: The Performance Implications of Thread Management Alternatives for Shared-Memory Multiprocessors by Thomas E. Anderson, Edward D. Lazowska, and Henry M. Levy.; Department of Computer Science FR-35, University of Washington, Seattle, WA 98195 Tech Reprot 88-09-04 The title should tell all. The paper contains both analytic models and experimental results.
jack@uunet.UU.NET (Jack Jansen) (11/03/88)
So far nobody seems to have mentioned that LWPs are a very nice mechanism to program with, much nicer than non-blocking I/O with completion interrupts or polling, etc. Just look at your average unix utility that uses SIGURG or select() and imagine what it would look like when written as a set of cooperating LWPs using blocking primitives and you'll get the point..... -- Fight war, not wars | Jack Jansen, jack@cwi.nl Destroy power, not people! -- Crass | (or mcvax!jack)
libes@cme.nbs.gov (Don Libes) (11/07/88)
In article <5354@saturn.ucsc.edu> mcvax!cwi.nl!jack@uunet.UU.NET (Jack Jansen) writes: >So far nobody seems to have mentioned that LWPs are a very nice >mechanism to program with, much nicer than non-blocking I/O with >completion interrupts or polling, etc. I agree completely. I recently wrote an article discussing exactly this. It showed the difficulty of writing a conceptually simple program, and how it resolved cleanly into 3 threads, just a couple of lines a piece. It also discussed threads with respect to OS/2 and UNIX. "Unraveling Threads", Micro/Systems Journal, p. 12, Nov. 1988. Another reference you might consider (also by me) is: "Multiple Programs in One UNIX Process", ;login, July/August 1987. This paper described an easy way to do LWPs in UNIX. The idea was that (the conceptually elegant) XINU was a LWP scheduler and that porting it to UNIX was easy. It required no modifications to the kernel. Only 6 lines of assembler to save and restore registers. Don Libes libes@cme.nbs.gov ...!uunet!cme-durer!libes
edler@cmcl2.NYU.EDU (Jan Edler) (11/15/88)
This topic is a few weeks old, but a mail/news problem here delayed my followup. I agree in general with most that has been said on this topic, mostly that there are many different things going around under the same or similar names, such as "threads" and "lightweight processes". Which one you want depends on your needs, goals, and environment. There are a couple of other comments I would like to make as well. First, I want to respond to one opinion expressed by mcvax!jack (Jack Jansen) in article <5354@saturn.ucsc.edu>: > So far nobody seems to have mentioned that LWPs are a very nice > mechanism to program with, much nicer than non-blocking I/O with > completion interrupts or polling, etc. > Just look at your average unix utility that uses SIGURG or select() > and imagine what it would look like when written as a set of > cooperating LWPs using blocking primitives and you'll get the > point..... This was also agreed to by libes@cme.nbs.gov (Don Libes) in article <5394@saturn.ucsc.edu>: > I agree completely. I recently wrote an article discussing exactly > this. It showed the difficulty of writing a conceptually simple > program, and how it resolved cleanly into 3 threads, just a couple of > lines a piece. It also discussed threads with respect to OS/2 and UNIX. Anyway, I have to say I disagree. There are two issues here, syntactic and conceptual. Things like constructing control blocks, checking for error conditions, etc., can be a real hassle in asynchronous I/O systems, but LWP systems can have just as much crap to deal with (e.g. stack space allocation, etc.). It just depends on how low-level the programmers' interface is, and how fancy the facility is. There is no reason not to provide an easy-to-use asynchronous I/O interface. Conceptually, both asynchronous I/O systems and LWP systems are sources of complexity. Both require you to deal with language/compiler issues having to do with asynchronous access to shared variables (at least something like volatile in ansi c, and possibly much more. Or a dumb compiler). Race conditions can arise in either kind of system, and can be equally hard to debug. But a LWP mechanism is fundamentally more powerful; basically I'm talking about the difference between interrupts on the one hand, and MIMD shared memory multiprocessing on the other. Clearly interrupts are not as powerful as multiprocessing, but their inherent structure and relative simplicity can make them somewhat more appropriate when that full power isn't needed. Of course, there may be other reasons why you want multiprocessing anyway; that's fine too. The other point I would like to address is whether kernel support is needed for LWPs. There are a couple of reasons usually given for why kernel support is needed: you don't want a HWP suspended when a LWP (running "on" that HWP) does I/O, or incurrs a page fault. Both these problems can be addressed with interrupts (signals), as an alternative to kernel-supported LWPs. The use of asynchronous I/O in the implementation of the user-level LWP system is fairly straightforward: since the I/O is asynchronous, the HWP doesn't block, but can switch to a new LWP. Page faults can be handled similar to the way real operating systems handle them: generate an interrupt to a piece of memory-resident code that deals with the situation. In this case and all that is necessary is to switch to a new LWP. When the page fault has been satisfied, another interrupt (analogous to the I/O completion interrupt generated by the hardware when the page-in is done) can cause the blocked LWP to be made ready again. Of course, there may be other reasons for choosing kernel-supported LWPs, but I don't think I/O and page fault handling are good arguments in favor of doing it that way. Jan Edler NYU Ultracomputer Project edler@nyu.edu cmcl2!edler
raveling@vaxb.isi.edu (Paul Raveling) (11/19/88)
In article <5467@saturn.ucsc.edu> edler@cmcl2.NYU.EDU (Jan Edler) writes: > >First, I want to respond to one opinion expressed by mcvax!jack (Jack Jansen) >in article <5354@saturn.ucsc.edu>: >> So far nobody seems to have mentioned that LWPs are a very nice >> mechanism to program with, much nicer than non-blocking I/O with >> completion interrupts or polling, etc. > >This was also agreed to by libes@cme.nbs.gov (Don Libes) in article ><5394@saturn.ucsc.edu>: >> I agree completely. I recently wrote an article discussing exactly >> this. It showed the difficulty of writing a conceptually simple >> program, and how it resolved cleanly into 3 threads, just a couple of >> lines a piece. It also discussed threads with respect to OS/2 and UNIX. > >Anyway, I have to say I disagree. There are two issues here, syntactic >and conceptual. On conceptual grounds I tend to agree rather than disagree. A virtue of processes, no matter what their weight, is encapsulation of logic. This should sound a bit like object-oriented programming. It turns out all that "process oriented programming" we did in the '70's had a lot in common with OO techniques. >Things like constructing control blocks, checking for error conditions, >etc., can be a real hassle in asynchronous I/O systems, ... I/O systems are easiest with a process-oriented approach. A process per device in the kernel often allows using the process state instead of state data saved in control blocks. This also tends to radically decrease the amount of "critical region" code that needs exclusive access to shared control blocks, device interface registers, etc. Good interprocess communication simplifies the interface between client processes and i/o [device driver] processes. In a system such as EPOS the kernel's mechanism for queueing messages as event data eliminated the need for separate i/o scheduling queues. As the application sees it, asynchronous and synchronous i/o are nearly identical. For asynchronous i/o, it supplies an event identier code for the kernel to use in the message reporting i/o completion; when the application eventually waits for or checks for that event, it gets the same event data as for a synchronous request (info such as completion status and number of bytes transferred). >... There is no reason not to provide an >easy-to-use asynchronous I/O interface. You bet! >Conceptually, both asynchronous I/O systems and LWP systems are sources >of complexity. Both require you to deal with language/compiler issues >having to do with asynchronous access to shared variables ... I think access to shared variables is the only thing that may need to involve the compiler. If the hardware guarantees as little as uninterrupted increment-and-test and decrement- and-test operations, even that can become almost trivial. Questions of blocking get to be potentially painful with LWP's. A fairly clean answer is to use "HWP's", but be sure it's easy for them to communicate via messages and to share portions of an address space. Of course VERY fine-grained LWP's or short threads would need more language help, but practically all of the situations I've encountered that beg for multiple processes or threads have been more appropriate for "heavies". The main virtue of LWP's on UNIX systems is to allow fast interprocess communication, fast context switching, and sharing an address space. It's not that tough to provide the same facilities to "HWP"'s, but it takes a different kernel. --------------------- Paul Raveling Raveling@vaxb.isi.edu