cpm@dlcq15.datlog.co.uk (Paul Merriman) (02/01/90)
Hi, I have encountered a couple of problems with Xenix 2.3.2 and Xenix TCP/IP with Western Digital or 3Com cards and was wondering if anyone else out there has had similar problems. Background Information ---------------------- We have several sites using Unisys PW800s (386 PCs) and Unisys Xenix 2.3.2, which is a Unisys licenced version of SCO Xenix 2.3.2, with Western Digital network cards. Problem 1) --------- Occasionally we get a kernel panic as follows:- TRAP 0000000E in SYSTEM, error code 06000000 eax=FF030202 ebx=00000000 ecx=4A000001 edx=00000030 esi=0008A204 edi=4A000001 ebp=06000620 fl=00010282 udc=00030018 es=00000018 fs=0003003F gs=0000003F tr=00000100 pc=0090020:0001A12b ksp=060005B8 kernel: PANIC: non-recoverable kernel page fault The machine has the following hardware ram : 1Mbyte + 4Mbyte ram card disk : 110Mbyte network card : Western Digital serial I/O : Anvil Stallion card O/S : SCO XENIX 2.3.2 beta release Machine model : Unisys PW800/20 On other occasions a machine will just "die", with no accompanying panic message. We mentioned the problem to someone at SCO some time ago and they came back with "it's a hardware error, reseat the memory boards". It has happened on several other machines since then so I think we can rule out hardware, unless it's a real incompatibility ;-( !! We haven't seen this panic on any other machines (e.g. Compaq), yet... The problem "seems" to be network related - i.e. we were doing something intensive on the network at the time (e.g. a large rcp) though this may be coincidence, or just not true! Problem 2) ---------- This has been seen on the above Unisys machines with Western Digital network card and a Compaq with 3Com card. A number of processes which have socket connections to other machines break their connections. It should be mentioned here that these processes use non-blocking writes and an alarm call to determine when to "give up" on the write and break the connection. In one case you could not then connect to the machine across the network (telnet, rlogin), though the machine is running and can be accessed from the console. In some cases the connections have managed to re-establish themselves some time later. Unfortunately because of client pressure to get the systems up and running again we have been unable to examine the problem "in situ" (or even in Halifax :-)!) and have had to restart the machines (which clears the problem). The SCO TCP/IP manual mentions an "attrition of resources" problem which they have had reported but cannot reproduce - maybe this is it. Some investigation using "rsh" showed that if you killed an rsh daemon on the PC then subsequent telnet sessions would just hang - as if waiting to write across the network. This was most noticeable if you tried to "cat" a long file whilst telnetting to the PC (it would just hang half way through, but did respond to the break key). This would tie in with "attrition of resources" - presumably the telnet session would be waiting for resources (buffers?) to become free. We are currently trying to reproduce this sort of problem in a reliable manner so that we can present this information to SCO; however until that time does anyone else using Xenix TCP/IP have similar experiences to recount? -- C. Paul Merriman <cpm@datlog.co.uk> or < {backbone}!ukc!datlog!cpm > Voice: +44 1 863 0383 (x2153)