todd@pinhead.pegasus.com (Todd Ogasawara) (06/07/91)
I'm having a problem with ISC's TCP/IP 1.2 and wanted to get some confirmation before I make a call to ISC and start complaining... Couple of pieces of information... 1. I'm running ISC TCP/IP 1.2 under ISC UNIX 2.2 on an Everex 386/33 with 8MB RAM and an 80387. I have a WD8003 ethernet card for use with TCP/IP. 2. I've received confirmation from ISC that at least one problem I've encountered (telnet sessions remaining hung [as reported by netstat]) is a bug in TCP/IP 1.2. 3. My problem is that I've noticed that TCP/IP 1.2 is hanging on me about once a week now. Sessions will abruptly hang. The processes and session are still showing as active as reported by 'ps' and 'netstat'. However, there is no way to ping the ISC UNIX box from a remote station. The ISC UNIX box itself appears to be running ok, except for that fact that all network capabilities are lost (it can't ping a remote station either). 4. The only solution to restore TCP/IP seems to be to reboot the ISC UNIX box. This can get pretty annoying if I have to do it once or twice a week. Has anyone else run into a similar problem (i.e., TCP/IP hanging and requiring a reboot)? Will TCP/IP 1.3 from ISC fix these problems? And, wher is TCP/IP 1.3? -- Todd Ogasawara ::: Hawaii Medical Service Association Internet ::: todd@pinhead.pegasus.com Telephone ::: (808) 536-9162 ext. 7
brennan@merk.UUCP (Rich Brennan) (06/11/91)
In article <1991Jun06.224153.209@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes:
.I'm having a problem with ISC's TCP/IP 1.2 and wanted to get some
.confirmation before I make a call to ISC and start complaining...
.
.Couple of pieces of information...
.1. I'm running ISC TCP/IP 1.2 under ISC UNIX 2.2 on an Everex 386/33
. with 8MB RAM and an 80387. I have a WD8003 ethernet card for use
. with TCP/IP.
It looks like a few people are taking the pipe on this one. I've got the
same (and other) problems. Wild guess: WD8003 or the WD8003 driver have
problems with "high speed" machines, e.g. race conditions in the driver.
.3. My problem is that I've noticed that TCP/IP 1.2 is hanging on me
. about once a week now. Sessions will abruptly hang. The processes
. and session are still showing as active as reported by 'ps' and
. 'netstat'. However, there is no way to ping the ISC UNIX box from a
. remote station. The ISC UNIX box itself appears to be running ok, except
. for that fact that all network capabilities are lost (it can't ping a
. remote station either).
I'm able to induce this easily, too. See if TCP/IP is really hung: ping
"localhost". When my machine hangs, localhost still responds to pings,
meaning the trouble is not in TCP/IP, but in the ethernet controller/driver
subsystem.
Another posting here (a reply to my earlier whining) said ISC is working on
these bugs. In the meantime, I'm going to try to snag a 3c503 card to see
if that driver is more robust.
Rich
--
brennan@merk.com ...!uunet!merk!brennan Rich Brennan
dougm@ico.isc.com (Doug McCallum) (06/11/91)
In article <4104@merk.UUCP> brennan@merk.UUCP (Rich Brennan) writes: ... >It looks like a few people are taking the pipe on this one. I've got the >same (and other) problems. Wild guess: WD8003 or the WD8003 driver have >problems with "high speed" machines, e.g. race conditions in the driver. There are a number of known problems with TCP/IP 1.2 that are being worked on for the TCP/IP 1.3 release later this summer. These include a number of conditions that cause TCP to get hung connections. They aren't related to the WD driver (there could be other problems there but we haven't seen them). > >.3. My problem is that I've noticed that TCP/IP 1.2 is hanging on me >. about once a week now. Sessions will abruptly hang. The processes >. and session are still showing as active as reported by 'ps' and >. 'netstat'. However, there is no way to ping the ISC UNIX box from a >. remote station. The ISC UNIX box itself appears to be running ok, except >. for that fact that all network capabilities are lost (it can't ping a >. remote station either). > >I'm able to induce this easily, too. See if TCP/IP is really hung: ping >"localhost". When my machine hangs, localhost still responds to pings, >meaning the trouble is not in TCP/IP, but in the ethernet controller/driver >subsystem. One of the bugs is aggravated by using the ping command. You can easily get random hung connections by letting ping just run free. Packets get lost (actually they don't ever get sent out on the net) and sometimes it interferes with TCP. > >Another posting here (a reply to my earlier whining) said ISC is working on >these bugs. In the meantime, I'm going to try to snag a 3c503 card to see >if that driver is more robust. I suspect that the WD driver is more robust than the 3C503 driver although both should be fine. There may be some systems that have problems with older WD cards so hardware can't be ruled out but in those cases the problems will usually appear as the network not working at all and will be hardware related not software. Doug McCallum Interactive Systems Corp dougm@ico.isc.com
robert@towers.uucp (Robert Hoquim) (06/12/91)
brennan@merk.UUCP (Rich Brennan) writes: >In article <1991Jun06.224153.209@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes: >.I'm having a problem with ISC's TCP/IP 1.2 and wanted to get some >.confirmation before I make a call to ISC and start complaining... >. >.Couple of pieces of information... >.1. I'm running ISC TCP/IP 1.2 under ISC UNIX 2.2 on an Everex 386/33 >. with 8MB RAM and an 80387. I have a WD8003 ethernet card for use >. with TCP/IP. >It looks like a few people are taking the pipe on this one. I've got the >same (and other) problems. Wild guess: WD8003 or the WD8003 driver have >problems with "high speed" machines, e.g. race conditions in the driver. I have many machines with WD8003 eithernet cards in them. The machines range from 386-33's to 486-33 EISA systems and have no problem with the 8003. I would look elsewhere for your problem since I can guarantee you that this card works fine in over 60 machines that I have put them into. Most were under ISC. >.3. My problem is that I've noticed that TCP/IP 1.2 is hanging on me >. about once a week now. Sessions will abruptly hang. The processes >. and session are still showing as active as reported by 'ps' and >. 'netstat'. However, there is no way to ping the ISC UNIX box from a >. remote station. I found that it seems that the ISC TCP/IP gets lost from time to time and can no longer find the hosts even if it is talking to one. I run 1 slip and 2 eithernet gates in one machine and when using broadcast and netmask statements in netd.cf instead of what the auto script generates things seem to work better. When running a gated systems don't use /etc/gated it flat has problems, set it up using multiple routed statements to the multiple sides of the gate. Even under very heavy load from unix-unix and unix-pci the network hasn't gone away for a period of months. I know this is just a backward way around things, but until ISC can fix gated and other things it is the best that we can do. Bob -------- Robert Hoquim - (robert@towers) - voice: 317-255-6807 - fax: 317-259-7289 Small Systems Specialists - 8500 N. Meridian - Indianapolis, IN 46260 -- Providing HIGH Performance Unix Systems to YOU is Our ONLY goal! --
brennan@merk.UUCP (Rich Brennan) (06/12/91)
In article <1991Jun11.152602.22404@ico.isc.com> dougm@ico.ISC.COM (Doug McCallum) writes: >There are a number of known problems with TCP/IP 1.2 that are being worked >on for the TCP/IP 1.3 release later this summer. These include a number of >conditions that cause TCP to get hung connections. They aren't related to >the WD driver (there could be other problems there but we haven't seen them). One of my posted bugs had to do with a panic induced via PC-Interface using my WD8003. Does PC-Interface use ISC's TCP/IP for communications? If you'd like, I can try running serial PC-Interface to see if there's some Locus code panic'ing the system instead of the WD8003 driver. However, note that I don't get a panic when using TCP/IP - I mearly get a hung TCP/IP. >One of the bugs is aggravated by using the ping command. You can easily >get random hung connections by letting ping just run free. Packets get lost >(actually they don't ever get sent out on the net) and sometimes it interferes >with TCP. I don't cause the problem with ping, I was using it after the fact to get some symptoms. I was logged in from my PC using TCP/IP, and after about 30 minutes of doing a "ls -lR" continuously over a rlogin/telnet connection, the connection simply hung. To see if it was a TCP/IP problem or ethernet, I did a ping on "localhost". When it answered, I stopped it, and tried to ping the ethernet interface. There was no answer from that interface. >I suspect that the WD driver is more robust than the 3C503 driver although both >should be fine. There may be some systems that have problems with older WD >cards so hardware can't be ruled out but in those cases the problems will >usually appear as the network not working at all and will be hardware related >not software. I've had a few emails saying that switching from WD8003 to a 3C503 fixed the problem. One even said going from WD8003 to the WD8013 fixed it. As far as old hardware, unless my distributor is shipping old stuff, my controller was purchased about 6 months ago. I am running a 8 MHz bus, so I shouldn't be beating on the card any worse than an newer '286 AT. The only difference is that my instructions execute a tad faster, so I could be doing back-to-back board accesses faster. By the way, thanks for posting here to keep us up-to-date on tracking this bug. Rich -- brennan@merk.com ...!uunet!merk!brennan Rich Brennan
todd@pinhead.pegasus.com (Todd Ogasawara) (06/18/91)
In article <1991Jun06.224153.209@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes:
]I'm having a problem with ISC's TCP/IP 1.2 and wanted to get some
]confirmation before I make a call to ISC and start complaining...
]
]Couple of pieces of information...
]3. My problem is that I've noticed that TCP/IP 1.2 is hanging on me
] about once a week now. Sessions will abruptly hang. The processes
] and session are still showing as active as reported by 'ps' and
] 'netstat'. However, there is no way to ping the ISC UNIX box from a
] remote station. The ISC UNIX box itself appears to be running ok, except
] for that fact that all network capabilities are lost (it can't ping a
] remote station either).
]4. The only solution to restore TCP/IP seems to be to reboot the ISC UNIX
] box. This can get pretty annoying if I have to do it once or twice a
] week.
I received a number of replies (including one from a person at ISC)
confirming that this is a known problem (sigh)... A few other folks
suggested a way to get TCP/IP back without rebooting though. My TCP/IP hung
again this morning (after behaving for 10 days). Here is what I ended up
doing.
login as root on the console
issue an 'init 1' to bring the system down to single user mode
request an 'init 2' from the single user menu
go to multiuser mode using ^D
issue an 'init 3' as root after getting back to multi user mode
Everything should be ok at this point. If you require a 'route' and don't
have it in your startup files, you will need to issue the appropriate
route add xxx.x.x.x yyy.y.y.y 1
command.
Of course, what we really need is a fixed TCP/IP 1.3 from ISC.
--
Todd Ogasawara ::: Hawaii Medical Service Association
Internet ::: todd@pinhead.pegasus.com
Telephone ::: (808) 536-9162 ext. 7
brennan@merk.UUCP (Rich Brennan) (06/18/91)
In article <4105@merk.UUCP> brennan@merk.UUCP (Rich Brennan) writes: >In article <1991Jun11.152602.22404@ico.isc.com> dougm@ico.ISC.COM (Doug McCallum) writes: >>I suspect that the WD driver is more robust than the 3C503 driver although both >>should be fine. There may be some systems that have problems with older WD >>cards so hardware can't be ruled out but in those cases the problems will >>usually appear as the network not working at all and will be hardware related >>not software. > >I've had a few emails saying that switching from WD8003 to a 3C503 fixed >the problem. One even said going from WD8003 to the WD8013 fixed it. [etc.] Well, here's my posting on my findings. I've emailed it to ISC support, too, so they don't have to scrounge through c.u.sysv386 to find it: ---- Here's my update to the PC-Interface panic/TCP hangs with ISC 2.2.1 and a WD8003. First, I got a call from ISC support. During the course of conversation I mentioned that I had installed the "network drivers" update NT00001. He said I didn't need it, and that I should back it out. I did, and sure enough "netstat -i" doesn't panic the system anymore. Next I installed the new 3c503 I purchased. I beat on the system pretty heavily all weekend, and compared with running with the WD8003: 1) I never panic'ed when running PC-Interface 2) VP/IX never died giving me "cannot emulate instruction" (or similar) diagnostics 3) TCP/IP never hung Granted, I didn't run TCP/IP for a week to see if it hung, so that problem may still be present (I'll let you know in a week). Just to sanity check that backing out the NT00001 update didn't fix my problems, I reinstalled my WD8003 card. I even configured the WD8003 to the identical I/O and shared memory addresses used by the 3C503. I booted my system, and within 15 minutes of doing a continuous "ls -lR" over an rlogin connection, VP/IX trapped out with the above error, and the rlogin connection hung. Now knowing that "netstat -i" wouldn't panic my system, I tried it: when netstat tried to retrieve the packet counts from the WD driver, I received an "ioctl timed out" diagnostic. netstat was able to get the packet counts from the loopback driver, and I was still able to rlogin using the loopback driver, i.e. TCP/IP wasn't hung, only the WD8003 subsystem. I think I'm going to stand by my guess: there's some race condition in the optimized WD8003 driver causing itself to hangup. I'm willing to try someone's "known good" WD8003 if the consensus is that my hardware sucks; we can arrange for some security just so you know I'm not out collecting Ethernet boards :-). Rich -- brennan@merk.com ...!uunet!merk!brennan Rich Brennan
news@heitis1.uucp (News Administrator) (06/19/91)
In article <1991Jun11.152602.22404@ico.isc.com> dougm@ico.ISC.COM (Doug McCallum) writes: >In article <4104@merk.UUCP> brennan@merk.UUCP (Rich Brennan) writes: >... >>It looks like a few people are taking the pipe on this one. I've got the >>same (and other) problems. Wild guess: WD8003 or the WD8003 driver have >>problems with "high speed" machines, e.g. race conditions in the driver. > >There are a number of known problems with TCP/IP 1.2 that are being worked >on for the TCP/IP 1.3 release later this summer. These include a number of >conditions that cause TCP to get hung connections. They aren't related to >the WD driver (there could be other problems there but we haven't seen them). > If the WD8003 driver for TCP/IP v1.2 is the same as that supplied in the Network Drivers Supplement available back around march, it has one major problem. (According to people at ISC). The buffer in the WD8003 8-bit card is something like 128 or 256 bytes. The buffer for the WD8003 16-bit card is double whatever the other one is. The driver was written expecting the larger buffer, and will sometimes "overflow". Also, you cannot use an 8-bit card with a 16-bit card as a gateway, apparently it just loses its mind entirely. brian
todd@pinhead.pegasus.com (Todd Ogasawara) (06/19/91)
In article <1991Jun17.201915.12498@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes: ]In article <1991Jun06.224153.209@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes: ]]I'm having a problem with ISC's TCP/IP 1.2 and wanted to get some ]]confirmation before I make a call to ISC and start complaining... ]I received a number of replies (including one from a person at ISC) ]confirming that this is a known problem (sigh)... A few other folks ]suggested a way to get TCP/IP back without rebooting though. My TCP/IP hung ]again this morning (after behaving for 10 days). Here is what I ended up ]doing. ] ] login as root on the console ] issue an 'init 1' to bring the system down to single user mode ] request an 'init 2' from the single user menu ] go to multiuser mode using ^D ] issue an 'init 3' as root after getting back to multi user mode Well, I spoke too soon... ISC TCP/IP 1.2 hung twice more today and the procedure I described above didn't work. I had to run /etc/shutdown and reboot the machine to get TCP/IP working again. It also appears that this problem is not uncommon and has been around since version ISC TCP/IP 1.0. This implies that I should not expect it to work any better when ISC TCP/IP 1.3 comes out. I'm not sure I can live with ISC TCP/IP hanging on me even as infrequently as once a week (let alone twice a day). I've heard that SCO's TCP/IP is pretty solid. Does anyone know if it will work with ISC UNIX 2.2? If not, are their any recommendations for another vendor's TCP/IP? Maybe Wollongong? -- Todd Ogasawara ::: Hawaii Medical Service Association Internet ::: todd@pinhead.pegasus.com Telephone ::: (808) 536-9162 ext. 7
cpcahil@virtech.uucp (Conor P. Cahill) (06/19/91)
todd@pinhead.pegasus.com (Todd Ogasawara) writes: >It also appears that this problem is not uncommon and has been around since >version ISC TCP/IP 1.0. This implies that I should not expect it to work >any better when ISC TCP/IP 1.3 comes out. I don't know how common it is supposed to be, but we have ISC running TCP/IP with NFS and X to 3 workstations. We haven't had TCP lock up since we upgraded to 2.2 last year. Our systems are usually up for months at a time. PS. None of the machines our clients have exhibit that problem. -- Conor P. Cahill (703)430-9247 Virtual Technologies, Inc. uunet!virtech!cpcahil 46030 Manekin Plaza, Suite 160 Sterling, VA 22170
kdenning@genesis.Naitc.Com (Karl Denninger) (06/19/91)
In article <1991Jun19.013909.690@pinhead.pegasus.com> todd@pinhead.pegasus.com (Todd Ogasawara) writes: > >Well, I spoke too soon... ISC TCP/IP 1.2 hung twice more today and the >procedure I described above didn't work. I had to run /etc/shutdown and >reboot the machine to get TCP/IP working again. I've seen this one. >It also appears that this problem is not uncommon and has been around since >version ISC TCP/IP 1.0. This implies that I should not expect it to work >any better when ISC TCP/IP 1.3 comes out. The worse problem is that any socket which is left in an "accept" state (ie: waiting for requests) or active for long periods of time will eventually end up hanging as well -- in a closed, inaccessible state. Unless you're looking at that particular port, all appears well. This means a network monitor which "pings" it once in a while will report all is ok -- but it most certainly is not! We have the ported LPR/LPD combination, as well as smail3. Both will hang up if left in daemon mode. Smail3 has a solution -- start it from inetd. LPR/LPD, unfortunately, doesn't -- so you end up with printers that just "detach" themselves, and printed files pile up on the user's station. The only way to clear it is to kill all existing processes that have the port open, and restart the daemon. Not at all impressive. This is the only OS (out of 3 other vendors here at NAITC) that I've seen do this. ISC hasn't managed to fix it right in three tries. -- Karl Denninger - AC Nielsen, Bannockburn IL (708) 317-3285 kdenning@nis.naitc.com "The most dangerous command on any computer is the carriage return." Disclaimer: The opinions here are solely mine and may or may not reflect those of the company.
dougm@ico.isc.com (Doug McCallum) (06/20/91)
In article <1991Jun18.184744.357@heitis1.uucp> news@heitis1.uucp (News Administrator) writes: ... >If the WD8003 driver for TCP/IP v1.2 is the same as that supplied in the >Network Drivers Supplement available back around march, it has one major >problem. (According to people at ISC). The buffer in the WD8003 8-bit card >is something like 128 or 256 bytes. The buffer for the WD8003 16-bit card >is double whatever the other one is. The driver was written expecting the >larger buffer, and will sometimes "overflow". Also, you cannot use an >8-bit card with a 16-bit card as a gateway, apparently it just loses its >mind entirely. This is nonsense. The WD8003* boards have an 8K buffer. The WD8013 cards may have an 8 or 16K buffer depending on versions, bus, etc. The driver was written originally with the ability to use any buffer size. The very first version was written using an 8K card (serial number 6 if I remember correctly) and makes absolutely no assumptions about a minimum or maximum buffer size. There is an internal "page" size of 256 bytes that is used by the hardware but this is not something that effects performance. An 8K card can overflow if used under heavy load or if packets are sent at it too fast. In this case it discards the packet that won't fit in the receive buffer. It doesn't lose its mind. What you were told just sounds like someone making up an excuse when they don't understand a problem. Where you will get real problems is in mixing 16-bit and 8-bit cards on the same system. The first Network Drivers Supplement didn't go through the contortions necessary to work around the wonderful feature of the AT bus where ALL memory access in the 128K region the board's RAM appears in must be in 16 bit mode. This is compensated for in the second release of the drivers (I don't know if it is shipping yet or not). To compensate requires a lot of work to avoid what happens if you don't. If you have the bus in 16 bit mode and try to access 8 bit RAM, you get trash (well, consistent trash of 0xFF) in the odd bytes. This is a design feature of the bus and is the same reason you get problems with 16 bit VGA boards and 8 bit LAN boards at the same time. Doug McCallum Interactive Systems Corp dougm@ico.isc.com