[comp.unix.ultrix] A couple strange PMAX problems

rosen@schizo.samsung.com (MFHorn) (10/24/89)

Greetings.  We've been having some strange intermittent problems with
our DECstation 3100s, and I was hoping I could get some help here before
I wait for DEC software support.

The configuration:

1 PMAX, 8 Meg, RZ55, TK50 (server)
5 PMAX, 8 Meg, 110 Meg 3rd party disk (clients)

The 5 clients have / and swap (88 M) locally, and 'mount server:/usr /usr'.
The server (dara) also has a /fs/var partition with subdirectories for
each system, and each system has a symlink from /var into /fs/var.

[Believe it or not, that all works fine, and saves us over 100 Meg of
disk space.]

All systems are monochrome, all are using thinwire.

Problem 1:

Sometimes the X server will start, but not display the login window.  Or,
the user will logout,and all the clients exit and the root window gets
reset, but gets no new login window.

Similarly, a user may login, have the login stuff go away, but doesn't
get any windows even though there is stuff specified in ~/.X11Startup .

The symptoms:

This all started hapening when we upgraded the systems from UWS 2.0 to
UWS 2.1 this past weekend.  I've worked on other PMAXen running UWS 2.1
and they're fine (they're all color, 16M, RZ55, TK50).

When a user has just logged in and waits (forever) for a session manager,
there is often either a second init running, or another Xprompter (it
seems as though they alternate starting and exiting).

Our workaround:

We can sometimes just rlogin and run
"/usr/bin/login -P /usr/bin/Xprompter -C /usr/bin/dxsession", or kill the
X server (Xmfb).  But sometimes we just get the same thing again and have
to kill the server again.

Problem 2:

Once the user finally gets logged in, things seem to work fine, for a while.
At some point things start to hang.  Some windows just stop, some keep going
and the load average slowly climbs, and climbs.

The symptoms:

The last time this happened, we were lucky enough to have a working emacs
window, and we could start up a shell and run some things (ps, vmstat, pstat,
who, uptime).

All the processes were either idle or running, none in resource waits.
There were 7 or 8 zombies, all using no CPU time and no memory.

There were only about 700 pages of memory left, but about 65 Meg of swap
space.

Our workaround:

Reboot.  Not acceptable.


Any suggestions you may have will be greatly appreciated.  I'm ready to
throw the things through the wall.

--
Andy Rosen                | rosen@samsung.com       | "I got this guitar
Samsung Software America  | rosen@ginosko.UUCP      |  and I learned how
One Corporate Drive       | (508) 685-7200          |  to make it talk"
Andover, MA 01810         |                         |    -Thunder Road