[comp.unix.xenix] SCO Xenix TCP/IP

cpm@dlcq15.datlog.co.uk (Paul Merriman) (02/01/90)

Hi, 
	I have encountered a couple of problems with Xenix 2.3.2 and Xenix
TCP/IP with Western Digital or 3Com cards and was wondering if anyone else out
there has had similar problems.

Background Information
----------------------

We have several sites using Unisys PW800s (386 PCs) and Unisys Xenix 2.3.2,
which is a Unisys licenced version of SCO Xenix 2.3.2, with Western Digital
network cards.

Problem 1)
---------

Occasionally we get a kernel panic as follows:-


TRAP 0000000E in SYSTEM, error code 06000000
eax=FF030202 ebx=00000000 ecx=4A000001 edx=00000030
esi=0008A204 edi=4A000001 ebp=06000620 fl=00010282
udc=00030018 es=00000018 fs=0003003F gs=0000003F
tr=00000100 pc=0090020:0001A12b ksp=060005B8

kernel: PANIC: non-recoverable kernel page fault



The machine has the following hardware

ram : 1Mbyte + 4Mbyte ram card
disk : 110Mbyte
network card : Western Digital 
serial I/O : Anvil Stallion card
O/S : SCO XENIX 2.3.2 beta release
Machine model : Unisys PW800/20


On other occasions a machine will just "die", with no accompanying panic 
message. We mentioned the problem to someone at SCO some time ago and they
came back with "it's a hardware error, reseat the memory boards". It has 
happened on several other machines since then so I think we can rule out
hardware, unless it's a real incompatibility ;-( !!

We haven't seen this panic on any other machines (e.g. Compaq), yet...

The problem "seems" to be network related - i.e. we were doing something 
intensive on the network at the time (e.g. a large rcp) though this may be
coincidence, or just not true!

Problem 2)
----------

This has been seen on the above Unisys machines with Western Digital network
card and a Compaq with 3Com card.

A number of processes which have socket connections to other machines break
their connections. It should be mentioned here that these processes use 
non-blocking writes and an alarm call to determine when to "give up" on the
write and break the connection. In one case you could not then connect to
the machine across the network (telnet, rlogin), though the machine is 
running and can be accessed from the console. In some cases the connections 
have managed to re-establish themselves some time later. 

Unfortunately because of client pressure to get the systems up and running again
we have been unable to examine the problem "in situ" (or even in Halifax :-)!)
and have had to restart the machines (which clears the problem).

The SCO TCP/IP manual mentions an "attrition of resources" problem which they
have had reported but cannot reproduce - maybe this is it.

Some investigation using "rsh" showed that if you killed an rsh daemon on the
PC then subsequent telnet sessions would just hang - as if waiting to write
across the network. This was most noticeable if you tried to "cat" a long file
whilst telnetting to the PC (it would just hang half way through, but did
respond to the break key). This would tie in with "attrition of resources" -
presumably the telnet session would be waiting for resources (buffers?)
to become free.

We are currently trying to reproduce this sort of problem in a reliable manner
so that we can present this information to SCO; however until that time does
anyone else using Xenix TCP/IP have similar experiences to recount?
-- 
C. Paul Merriman        <cpm@datlog.co.uk> or < {backbone}!ukc!datlog!cpm >
                       Voice:  +44 1 863 0383 (x2153)