[comp.windows.x] rgb database corruption

rws@EXPO.LCS.MIT.EDU (06/15/89)

Here's an unofficial diff fragment to server/os/4.2bsd/osinit.c that might or
might not cause this problem to disappear.  You can try it if you are being
pestered by this problem.  You should probably ignore it if you aren't.
Your mileage may vary.

*** 60,70 ****
  	{
  	    long t; 
  	    char *ctime();
  	    fclose(stdin);
  	    fclose(stdout);
  	    sprintf (fname, ADMPATH, display);
! 	    if (!freopen (fname, "a+", stderr))
! 		freopen ("/dev/null", "w", stderr);
  #if defined(macII) || defined(hpux)
  	    {
  	    static char buf[BUFSIZ];
--- 60,80 ----
  	{
  	    long t; 
  	    char *ctime();
+ 	    FILE *err;
  	    fclose(stdin);
  	    fclose(stdout);
  	    sprintf (fname, ADMPATH, display);
! 	    /*
! 	     * uses stdio to avoid os dependencies here,
! 	     * a real os would use
!  	     *  open (fname, O_WRONLY|O_APPEND|O_CREAT, 0666)
! 	     */
! 	    if (!(err = fopen (fname, "a+")))
! 		err = fopen ("/dev/null", "w");
! 	    if (err && (fileno(err) != 2)) {
! 		dup2 (fileno (err), 2);
! 		fclose (err);
! 	    }
  #if defined(macII) || defined(hpux)
  	    {
  	    static char buf[BUFSIZ];

john@acorn.co.uk (John Bowler) (06/16/89)

In article <8906142323.AA04774@expire.lcs.mit.edu>, rws@EXPO.LCS.MIT.EDU writes:
> Here's an unofficial diff fragment to server/os/4.2bsd/osinit.c that might or
> might not cause this problem to disappear.  You can try it if you are being
> pestered by this problem.  You should probably ignore it if you aren't.
> Your mileage may vary.
> 
> [Patch - most omitted]
> ! 	    if (!(err = fopen (fname, "a+")))
> ! 		err = fopen ("/dev/null", "w");
> ! 	    if (err && (fileno(err) != 2)) {
> ! 		dup2 (fileno (err), 2);
> ! 		fclose (err);
> ! 	    }

This fixes one obvious problem, but this problem (connection of stderr to
a file descriptor other than fd 2) is not the only possible cause of rgb
database corruption.  I have been running with appropriately fixed R2 code
and still observed these symptoms.  For my code to fail a subsequent
open of /dev/null must also fail - I come to the conclusion that this
must be happening (very rarely) on the systems I use - and I notice that
the above code will still go wrong if the fopen ("/dev/null", "w") fails.

For the database to be corrupted (given the normal installation mechanism)
the server must be running as root and (at least) the open of /usr/adm/X?msgs
or the subsequent dup2 must fail.  Assuming the directory /usr/adm exists
the only likely reason for failure on a bsd, or bsd-tahoe, system, is if
the kernel file table fills up - which will tend to mean that all the opens
fail together.

I reckon the server should check both ``err'' and fileno(stderr) and, if either is
wrong, it should give up.  Of course, I'm biased - Acorns customers received X
binaries on 50MByte discs (so no possibility of fitting the source on).  If they
manage to corrupt their rgb database they can do nothing about it short of a
going to the level 0 backup which they did, of course, make as soon as they got the
system...

The following code fragment (**NOT guaranteed - bsd specific - caveat emptor**)
should work.  A better fix, for those with access to the ndbm package, is to
hack oscolor.c and osinit.c to open the database RD_ONLY (if only it was
possible to cause dbm to open the database read-only - but even making the
files read-only doesn't help under bsd; the super user can always write to them).

    /*
     * This is done in this nasty way to ensure that the correct file descriptors end
     * up connected to the correct place.
     */
    /* Zap stdin and stdout */
    if (freopen("/dev/null", "r", stdin) == NULL)
	_exit(2);
    if (freopen("/dev/null", "w", stdout) == NULL)
	_exit(2);

    /* See if stderr is a reasonable stream, if it is assume it is ok */
    if (fcntl(2, F_GETFD, 0) == (-1)) {
	char fname[MAXPATHLEN+1];

	sprintf (fname, ADMPATH, display);
	if (freopen(fname, "a+", stderr) == NULL &&
	    freopen("/dev/tty", "a+", stderr) == NULL &&
	    freopen("/dev/console", "a+", stderr) == NULL ||
	    fileno(stderr) != 2)	/* Could output error message here */
	    _exit(3);