prl@iis.UUCP (Peter Lamb) (04/02/89)
I have a number of problems with the task class as distributed with AT&T 1.2.1 cfront. Some of these are just plain bugs, for which there are relatively simple fixes, others are real problems with the basic idea of the implementation. I have been thinking about writing this article for a while, but have been waiting for the release of cfront 2.0, which I had hoped may have addressed some of the problems. However, I have just been told by AT&T Unix Europe that cfront 2.0 will be not be out until 3Q89. These problems became apparent while I was trying to understand the task library sufficiently to port it to our Sequent Symmetry (i386 CPU) and to our Alliant FX/80, and having had to fix some of the bugs mentioned below on these machines and on our Suns and VAXen. BUGS. It is to be expected that code like this might be buggy. Implementing the low level of concurrency is tricky. It was the equivalent code in V6 Unix which contained the immortal /* You are not expected to understand this */ comment. The bugs that I know of are: 1) In lib/task/task.c, the macros SETTRAP() and CHECKTRAP() are not defined as part of the machine-dependent setup, although they depend on the machine's stack growth direction, and they are set up for the unconventional stack-grows-to-increasing-addresses direction of the AT&T 3BX. This is easily fixed. 2) The routines task::save() and task::restore() do not save/restore the register state (apart from SP, FP and AP). This means that if you use register variables in tasks (or functions called from tasks) you may get surprising results. A half-fix for this is easy, but a real fix is difficult. 3) The routine _sswap(), used for switching stacks in SHARED mode uses its arguments on the (old) stack, after the new stack has been copied in. There are a few words of slop, so that this code sometimes works, but if the new stack is much bigger than the old stack, the values will have been wiped out (in the VAX and Sun versions, at least). 4) The code for _sswap() in sun_swap.s copies the stack in the wrong direction!! (ie, from the stack base towards increasing addresses for the stack size, instead of from the stack base towards decreasing addresses for the stack size). 5) The code for the SHARED stack mode copies too much data. It copies the whole stack frame of the establisher of the task in and out on each task swap. This doesn't matter normally provided that you never reference data in the establisher's frame while running another task. This is forbidden by the paper describing the task system, but there is a way in which it can happen for a fairly reasonable program: main() { my_task a(......); my_task b(......); thistask->resultis(0); } when a task runs, an old copy of the task is put back onto the area where the `correct' task should be; this causes havoc when the task switches to a new task, since all the data associated with the implementation of tasks a and b will be wrong! The bugs 1, 3 and 4 are easily fixed. A fix exists for 5, but I haven't tried to do it, since simple workarounds exist; eg main() { static my_task a(....); static my_task b(....); } I have a partial fix for 2, which brings us to the real problem. THE PROBLEM There doesn't seem to be a reasonable way to implement a reasonably robust version of the task library in its current form given: 1) The current method of constructing tasks (the task code is the constructor of a class derived from class task). 2) That most implementations of C++ (either through their C compiler or directly in the case of g++) use callee-saves for saving register state. This is independent of problems which may occur if DEDICATED mode tasks overrun their stacks. I will attempt to substantiate this claim... HOW TASKS GET CREATED The task creation code does the following (this is slightly simplified, for the real story, see the code!): 1. Establishes the task environment; creates the task stack and copies in the arguments to the task (and incidently, the data part of the creator's frame), patches the static chain in the stack if the stack is DEDICATED and saves the information needed to (re)start the task. Currently this information is the FP, AP and `this' of the task. 2. The task must now return to the creator of the task without executing the body of the user's constructor. A task constructor gets turned into code which does the following: mytask__ctor() { task__ctor(); /* body of my task constructor code */ } task__ctor() `returns' in two contexts: the `normal' return to the caller of the constructor, and in the context of the task. The `normal' return is in no way normal! The `normal' return must *not* execute any code in the body of my constructor (this is often an infinite loop, anyway!). What is done is that the lowlevel code of task.c adjusts the frame pointer to point to mytask__ctor()'s frame and then does a return. This is the nub of the problem. This means that any registers saved by the startup code of task__ctor() or any of the subroutines along the calling path from task__ctor() to the routine which does this manipulation (typically _swap(), an assembly-language routine) won't get restored, and if the caller of the constructor has any active register variables, they may get wiped out. The problem lies in the fact that the caller of a routine depends on the callee-saves protocol being obeyed by all routines along the way. This protocol is broken by the task library code. Even refraining from using register variables may not help. Modern C (and C++) compilers may do their own register allocation. AT&T cfront should put `this' into a register if it is referenced more than twice (it doesn't, though. I have sent a fix for this to the AT&T C++ people. More about fixes later). g++ will automatically place things in registers even if -O is not used (this is a feature, not a bug). Note: there _is_ a way to save and restore registers correctly once the task has been established. It is a bit of a hack, but it makes the task library somewhat less rickety. However, the problem of maintaining the registers correctly at task creation time is a nasty one, especially if 100% code compatibility is needed. POSSIBLE FIXES The easiest one is to change the way tasks are created. This can be done by adding an explicit operator to the task to detach it. Near-compatibility can then be achieved by a macro: class task { protected: int detach(); // Returns 1 in the task, 0 in the creator public: task(); }; #define TASK if(detach()) class mytask : task { public: mytask(); }; mytask::mytask() { TASK { // task body goes here } } The crucial difference here is that detach() returns normally to mytask::mytask(), which means that the stack is all unrolled correctly. (Unfortunately I don't have an implementation of this). Another approach would be to move the task code out of the constructor: class task : { public: virtual void taskmain(); task(); }; class mytask : public task { void taskmain(); }; void taskmain() { // Task body } This has the disadvantage that the task body can only be passed arguments as data in the object. It has the advantage of not needing to copy a piece of the caller's stack into the stack frame to establish the task. I would be interested in hearing from anyone who has successfully used the task library from cfront 1.2 for any non-trivial programs, and what problems they have had and/or fixed. FIXES & AT&T In September/October last year, I contacted AT&T Unix Europe to ask who I might send some compiler bug reports and some fixes to. I was given an email address and I contacted the person, who was happy to pass on the bug reports and fixes to the development people. NB: Cfront is *not* supported, so this was simply AT&T goodwill, which was much appreciated. In the same note, I (foolishly ?) asked if there would be any objection to my posting the context diffs for these fixes on the net. the response was: 'Frankly, I do not know what the license stipulates. I will check. In the meantime, please do not. Because cfront is not a public domain product, I don't think that exposure of the code like this is the proper thing to do. I will look into this and get back to you to determine what you can do.' I have not heard anything since, though I enquired after it once. So I am unfortunately in the position of having been told by an AT&T employee that sending such fixes may be in breach of our license, and that I should not post any such fixes. Incidently, I have not sent my changes to the task library to AT&T, since I don't think that they really fix the problem. -- Peter Lamb uucp: uunet!mcvax!ethz!prl eunet: prl@ethz.uucp Tel: +411 256 5241 Integrated Systems Laboratory ETH-Zentrum, 8092 Zurich
grunwald@flute.cs.uiuc.edu (04/03/89)
I'd looked at the AT&T Tasking library & came to basically the same conclusions that you did. It's a mess, and it's more of a mess than you might suspect. Consider porting to a compiler that doesn't require frame pointers for procedure (Greenhills on the 32K, Gnu CC on everything). All of a sudden, you find yourself copying *huge* stacks for what might be a couple of arguments. Also, adding the task to the scheduling list in the constructor for the task class is a pain. In multiprocessor environments, it leads to a parallel-unsafe implementation: one processor can be executing subclass constructors while another pulls the task off the queue to execute. Bogus. I wrote another tasking library that has a class Thread. Constructors to a thread are simple constructors, nothing more. You pass arguments, save what you need, compute what you need, etc. A thread must be explicitly enqueued to a queue (typically, one maintained as ``ThisCpu'', the abstraction for the current processor). All threads have a virtual void function called ``main''. When a thread is invoked for the first time, execution ``magically'' starts by calling the ``main'' function. When you exit ``main'', your thread kills itself. The context information for a thread is stored in another class, called a HardwareContext, allowing some seperation of the thread information and contexts. Things other than threads (e.g., the actual unix processes managing the threads) have contexts. This has worked fairly well. It runs on Sun-3's & the Encore multimax. I've written a library of C++ objects that implement either a SingleCpu or a MultiCpu scheduler. There are subclasses of the Cpu objects that that implement global-time process-oriented simulation. There are SpinLocks, SpinEvents, SpinBarriers, Semaphores, Events, Barriers, and Facilities. The entire package gets a fair bit of use here, because it's used for simulation. Someone is also using it as a simple multi-tasking library for the Encore multimax for a parallel C++-scheme interpreter. There are some complications that eventually need resolving, probably when G++ gets M.I. working. Somethings would be easier if delegation-via-pointer was available. Eventually, the whole kit-and-kaboodle will be available, probably through Gnu Libg++. -- Dirk Grunwald Univ. of Illinois grunwald@flute.cs.uiuc.edu
shapiro@blueberry.inria.fr (Marc Shapiro) (04/24/89)
In article <810@eiger.iis.UUCP> prl@iis.UUCP (Peter Lamb) writes: > >I have a number of problems with the task class as distributed >with AT&T 1.2.1 cfront. We still use the 1.1 version but apparently there are no significant differences. >2) The routines task::save() and task::restore() do not save/restore > the register state (apart from SP, FP and AP). This means that if > you use register variables in tasks (or functions called from tasks) > you may get surprising results. > A half-fix for this is easy, but a real fix is difficult. > A good, portable (but potentially expensive) fix for this exists: use setjmp/longjmp in task::task and task::schedule. You do a longjmp to where you already are just so it saves the registers. There is another bug you didn't list: if a task contructor creates more tasks, everything breaks. (I forget the reason for this.) >I would be interested in hearing from anyone who has successfully >used the task library from cfront 1.2 for any non-trivial programs, >and what problems they have had and/or fixed. We use it for a fairly big OS kernel. We had to fix a few things along the way (especially register save/restore of course) but I can't give you a whole lot of information as the people who did this have left the group. Marc Shapiro Marc Shapiro INRIA, B.P. 105, 78153 Le Chesnay Cedex, France. Tel.: +33 (1) 39-63-53-25 e-mail: <shapiro@sor.inria.fr>, or: <...!uunet!inria!shapiro>, or (from some non-standard US sites): <inria!shapiro@uunet.uu.net>