dennis@dennis.colorado.edu (05/04/89)
I seem to have encountered some obscure race conditions in one of my programs. The first problem arose when I would invoke the redisplay option in the frame menu for a special class of window (the controlwindow that I posted some time ago). If I did this, it caused the window to hang and not accept any more commands (keyboard or mouse). My, rather complicated, window class appeared to be in some sort of infinite loop. To further complicate things, this only happened on Color Suns; black and white Suns worked ok. After some effort (NeWS debugging facilities leave much to be desired), I traced the problem to a 'pause' commmand in my PaintFrameControls procedure. If I removed the pause, the window worked ok; if I replaced it, it failed. As near as I can tell, what happens is that the refresh command invokes paint to repaint the window. Soon afterwards, the Menu code invokes paint on its client window to repair the damage caused by the menu overwriting part of the window. Both of these run as processes because the litewindow class forks the paint processes. Thus, in theory, we have two processes simultaneously trying to repaint the window. My hypothesis is that if the pause is not included, then the two paint processes operate in serial order. When the pause is included, it allows them to interleave and this causes some sort of error whose exact nature I cannot yet determine. To test this hypothesis, I set ForkPaintClient? to false. When I did this, the window again worked correctly. I do not know exactly why the code works on a B&W Sun, but not on a Color Sun. Again, I hypothesize that re-painting on a Color Sun takes longer, which might affect the timing of the race condition. Alternately, there is a some bug in the way that NeWS handles color that is causing the problem. So, be careful about sticking pauses into your code. You might want to run with ForkPaintClient? false in all your windows when on a color sun (but see below). The second problem occurred when I would invoke zap from the frame menu. What appears to be happening is that the destroy code wipes out major portions of the window, and then the menu code subsequently tries to repaint the window. Again, this is a race condition. My solution was to add a 'destroyed' field to my window, and add a test in PaintClient and PaintFrame. if the window is destroyed, then the paint routines do nothing. this is admittedly a kludge, but I suspect that the whole window/menu system needs re-thinking to rid itself of a whole class of race conditions like these. Note that another way to fix this is to make ForkPaintClient? true and put a pause in the /destroy code. This should allow the menu process to complete and the window to quiet before the /destroy code is executed. I did not try this solution to see if it worked. Notice that this solution is in direct opposition to the solution proposed for problem one above. -Dennis Heimbigner