ehm1@wally.cc.msstate.edu (Eddie Mikell) (04/16/91)
Fellow netters: I need help with a problem which has plagued me for almost 2 YEARS, but I have had no luck from vendors or netwire on a solution or a possible course of action. I'll try to break my setup down: The backbone: Our backbone consist of a fiberoptic loop connecting all the buildings on campus. On that loop we have a variety of systems running, Sun's, Decs, etc.. We also have 13 Novell servers connected to the backbone. Each server is isolated from the backbone by a Retix bridge, and each building is islolated from the fiber by another Retix or Cabletron bridge. The Novell's can all communicate with each other, and we have had few problems. 4 of the servers are used in a lab environment with with boot roms and remote reset image files. All the machines are running Advance Netware 2.15 Version C, except the one giving me trouble which is running Netware SFT 2.15 C. The machine giving me the trouble: The server is a Compaq 386 16 mhz "Deskpro". It is using a Racal Interlan NP600 comm card. It is running netware 2.15 version C with SFT and the TTS is turned on (note.. it is the only machine on campus with TTS). The 2 year problem: When ever the compaq is placed on the campus backbone, it will crash. The time between crashes is random, and it does not appear to have anything to do with the load on the ethernet, as it will crash at 3 in the morning, or 3 in the afternoon. It does not give an error message, and will not take any input on the keyboard after the crash, so it is totally locked up. If you leave the machine off the net (by turning off its Retix bridge), the machine will run with no problems. What I've done: Since I suspected hardware problems first, I ran the Compaq diagnostics. They turned up nothing. The machine was running with 3 meg of memory, but since it was handling routing request for the whole net, I upgraded the memory to 8 megs. Same problem. The machine has been totally replaced with another machine of the same make and model... only the disk drives were moved from one to the other. Same problem.... I took the sniffer up and let it run till the server crashed again. Nothing out of the ordinary was indicated. No packet fragments, etc. I did see some intercampus network activity going on behind the bridge, so I swapped that bridge with another retix, and with another cabletron, but noted the same traffic. I checked the other behind other bridges on campus, and this same traffic was seen on all the bridges that were connected to Novell servers, so it seems to be something that is happening to all the servers on campus... (that's another mystery which I need to investigate.. I thought bridges were supposed to filter out that stuff). Call for help: I'm now at a loss why this particular machine should crash the way it does. Since I've replaced everything hardware wise except the diskdirve (which passed the diagnostics, the hot fix area hasn't been used, etc). I'd have to rule out hardware problems. As I`ve noted before, its the only machine on the net that's crashing like this. The other servers are various brands of PC, but mostly your 386 16mhz clones, and they do not have this problem. No other system on campus runs the SFT software. It seems hard to believe that somehow the TTs and the campus backbone are colliding, but there could be a very logical reason why this is so. I'm open for any suggestions, either from people who have had this kind of a problem, suggestions on what to test next, what I should replace, etc.... I've left this same message on Netwire, but no solutions have come from that direction either. Thanks Eddie H. Mikell
jlamb@npd.Novell.COM (Jason "Nematode" Lamb) (04/18/91)
Have you checked FCONSOLE statistics for the server? Particularly Dynamic Memory Pool 1 Maximum and Peak used Stats. If you haven't login to the server and run FCONSOLE and pull up the Statistics Summary screen. Then let the server do its thing. When it crashes again, note DMP 1 stats. If you're really close, you will have locked up the server by maxing out DMP 1 usage. You can then try a couple of things. 1) Turn off TTS. (The server will then go through the right error handling for maxing out DMP 1) If that's not acceptable try to limit DMP 1 usage. (If you don't know what takes up DMP 1, reply..) Also I believe this problem is taken care of in 2.2. Jason Lamb
kenh@techbook.com (Ken Haynes) (04/20/91)
Jason, What about routing errors? I've heard of this kind of thing happening in other multi protocol environments on ethernet and I think I recall the problem was in the routing of "other" packets internally. I also heard that if the server and workstations were e-config'ed the problem went away, because the server now understood and could handle the "alien" packet traffic properly. This is all third hand, but thought I'd throw my $.02 worth in. Ken -- ****************************************************************************** * Ken Haynes, CNE | 1-900-PRO-HELP * Technical Support Product Manager, 900 Support * UUCP: {nosun, sequent, tessi} kenh@techbook