[comp.windows.x] Bug in server with many screens

peterson@SW.MCC.COM (James Peterson) (11/09/88)

VERSION:
    R3

CLIENT MACHINE and OPERATING SYSTEM:
    Sun 3/260 running SunOS 3.5

DISPLAY:
    Sun CG2

AREA:
    server

SYNOPSIS:
    Server code core dumps when more than 5 screens are used.

DESCRIPTION:
    The server code is ill-designed to handle multiple screens.  These problems
exist at many levels, both in the Sun-specific and dix code.

The first problem is with include/misc.h.  This file defines a symbolic
constant MAXSCREENS with a value 3; this constant controls the maximum
number of screens that the server can handle.  It is used to control the
size of arrays in dixstruct.h, dixfontstr.h, cursorstr.h, window.c, globals.c,
and extension.c plus being used in the ddx code for cfb, mfb, mi, ndx, plx, 
and apollo, and should be used instead of NUMSCREENS for the sunFbs array
in sun/sunInit.c.  No code checks to see if the number of screens exceeds
MAXSCREENS -- AddScreen in server/dix/main.c would the be appropriate place
for this check.

Looking at AddScreen, however, we find that the actual array of screen
descriptions is more flexible.  The screenInfo record has two variables:
screenInfo.numScreens and screenInfo.arraySize.  The arraySize variable
indicates the size of the actual array for holding screen descriptions;
numScreens says how many we actually have.  If we invoke AddScreen and
find that we have no more free space in the screen descrption array, the
following code is executed:

    if (screenInfo.numScreens == screenInfo.arraySize)
    {
	screenInfo.arraySize += 5;
	screenInfo.screen = (ScreenPtr)xrealloc(
	    screenInfo.screen, 
	    screenInfo.arraySize * sizeof(ScreenRec));
    }

This code allocates room for 5 more screens, and then re-allocates memory
for the screen array and copies the previous screen array into the new
memory -- fairly standard for Unix C code.

This is, however, anomolous since the number of screens cannot exceed
MAXSCREENS anyway, so simply preallocating screenInfo.screen and 
screenInfo.arraySize to MAXSCREENS eliminates the need for this code.
screenInfo.arraySize is currently initialized to 0, so in the current code
this dynamic array allocation is normally executed once at the very beginning.
In a system with MAXSCREENS reset to a higher value (like 10), and more
than 5 screens, it will be executed again later.  The expectation would then
be that space would be reallocated, values copied and the server would be
able to support the additional screens.

However, AddScreen later executes the following to complete the initialization
of the new screen:

    if ((*pfnInit)(i, &screenInfo.screen[i], argc, argv))
    {
	screenInfo.numScreens++;
        CreateGCperDepthArray(i);
	CreateDefaultStipple(i);
    }

For a Sun, the pfnInit will be sunCG2CInit or sunCG2MInit or ... but all
these are similar.  In particular these routines call SunScreenInit which
calls CreateScratchGC which stores the address of the screen descriptor
(the &screenInfo.screen[i] parameter above called pScreen in the routines)
into a GC, or by calling mfbScreenInit or cfbScreenInit, into the pScreen
field of a drawable for the basic pPixmap.  You can trace the code, but the
basic idea is that the &screenInfo.screen[i] parameter under the name pScreen
gets stored into various structures.

This is not limited to Sun servers -- AddScreen itself calls 
CreateGCperDepthArray which calls CreateScratchGC which stores away a set of
&screenInfo.screen[i] pointers.

When the number of screens exceeds 5, however, the dynamic reallocation code
in AddScreen is executed.  This moves the screenInfo.screen array, and ALL
THOSE STORED POINTERS ARE WRONG.  The basic rule of reallocating space is
that you can only do it if all pointers to that space can be found and
changed.


REPEAT BY:
    First run on any system with more than 3 screens to trash the server
    because MAXSCREENS is set to 3 and arrays allocated against it are not
    checked for exceeding their size.

    Second, change MAXSCREENS to a larger number (like 10) and run with 
    more than 5 screens to see the dangling pointer problem.

SAMPLE FIX:

    A simple hack:

    Set MAXSCREENS to 10 and change the dynamic allocation code to use
    MAXSCREENS instead of 5 and to recognize a fatal error if the number
    of screens exceed MAXSCREENS.


    A fix:  redesign the entire screen handling code to either be fixed
	    size static (ala MAXSCREENS) or truly dynamic.

    Also, all other reallocations may need to be examined to see that they
    do not leave dangling pointers.