shepperd@dms.UUCP (Dave Shepperd) (09/29/89)
Is it just my network, or are all TCP/IP Ethernet networks so "fragile"? It seems anybody can write a program to open a socket connecting to a remote node then do something to lockup the network on both systems. Sometimes I've noticed that this can even bring down the network on nodes not involved in the connection. We're doing X window development using the X11R3 distribution on several Xenix/386 systems using NCD X window servers all running TCP/IP over Ethernet. It happens all too often to accidentally do something with X which will knock the 386's off the network. Whatever happens (I don't know what it is), isn't always fixed by just restarting the network on the first system to die. Sometimes I've had to stop and restart the network on the 386's and ALL the servers. This is icky. Unfortunately, I don't have a network analysier or any promiscious mode software to see what might be happening on the wire, but I do have transceivers with leds indicating send/receive traffic. They don't indicate anything different than they normally do during one of these crashes. I.e., they don't go into steady send or receive and there are no more collisions than normal (collisions are pretty rare in any event). There is also DECnet, LAT and LAVC traffic on the same wire which has apparently never been affected by any of the IP traffic even during one of these crashes. There are some non-Unix boxes on the net that speek TCP as well as one of the VMS Vaxen. Their network doesn't crash or get stuck when one of the Xenix systems network dies, but the TCP on VMS can be crashed by opening a socket and doing something incorrectly. I should point out that there has never been any messages produced by the TCP software on the console during one of these crashes, so how does one go about figuring out what the hell is happening? Thanks for any help anyone can provide. -- Dave Shepperd. shepperd@dms.UUCP or motcsd!dms!shepperd Atari Games Corporation, 675 Sycamore Drive, Milpitas CA 95035. (Arcade Video Game Manufacturer, NOT Atari Corp. ST manufacturer).
hedrick@geneva.rutgers.edu (Charles Hedrick) (09/29/89)
Your software is buggy. Now and then we've run into implementations where for some reason or other the software hung. These have generally been new implementations. Such bugs were regarded (correctly) as serious problems, and fixed. It's also possible that a bug or misconfiguration has resulted in a "broadcast storm". In that case, your software isn't hung -- it's just being saturated by lots of packets. I would suggest getting one of the MS/DOS TCP/IP implementations, and running netwatch. That should show you what is going on if it's a broadcast storm. If it's a fragile Ethernet device driver, looking at the net may not shou anything. Probably that can only be debugged if you have source to the software.