mark@cbosgd.UUCP (Mark Horton) (04/13/85)
We've run up against an interesting property of our Ethernet, and would like to report it and see if anybody else has seen the same behavior. Our Ethernet looks like this (roughly): terminator (50 feet) cborion (diskless Sun 120 with 3Com board and 3Com xcvr) (150 feet) cbcephus (VAX 785 with Interlan board, no software installed yet) cbtac (Bridge CS/100 with modified Interlan xcvr) (150 feet) cbosgd (VAX 750 with Interlan board, Interlan xcvr) (5 feet) cbhydra (Sun 170 with 3Com board, Interlan xcvr) (50 feet) cbpavo (Sun 120 with local disk, 3Com board, 3Com xcvr) (200 feet) terminator We've been having a severe noise problem between hydra and orion, the Sun net disk protocol will report that the net disk isn't responding, wait several seconds, report that it's OK again, then immediately report that it's not responding again. This varies wildly - sometimes it's so severe you can't get any work done, other times it's fine. We've run a TDR (it shows the cable is fine) and replaced all the taps, xcvr cables, and so forth. We even tried plugging pavo into orion's slot, and pavo has the same problem. Orion works fine in pavo's slot. Pavo does have problems with the net disk going away, but they are brief and rare, (3 error messages per day worse case) and don't affect use of the machine. The one thing we had not replaced was the Ethernet cable itself, so we suspected it. But you know what ia pain is to run 300 feet or more of cable, and we didn't have the foresight to run a second cable, so we were reluctant to substitute a cable. Well, yesterday we got fed up and dragged out the spool of cable. After confirming that the xcvr, xcvr cable, and Sun 120 could all be swapped and that the problem just occurred at that location on the Ethernet, we ran a second cable from cborion to about 50 feet short of cbosgd (bypassing cephus and the tac.) We were amazed when it showed exactly the same problem, especially since the cable just ran down the hall, not up in the ceiling. The problem was pretty severe yesterday - orion was in a catatonic state, we couldn't even get a response from the shell. When booting it, the problem reproduced very consistently - we would get 2 to 5 ?'s during the boot sequence (on top of the -'s and ='s while the netdisk booted.) So we started to simplify things, and first we unplugged the xcvr cable from cbosgd. The problem magically went away. Putting back the real cable and plugging/unplugging osgd's xcvr cable repeatedly confirmed that this was 100% correlated with the problem. OSGD was somehow putting evil waves out onto the net that kept Orion (which is physically the furthest away) from reliably talking to Hydra. (Our cable person insists that he's had OSGD unplugged before and the problem remained, I can only report what we observed yesterday, and speculate that there may be other factors here that don't meet the eye.) We swapped the transceivers between Hydra and OSGD - same result. We changed the xcvr cable, the little 10 foot board-to-xcvr cable, and swapped between 3 different Interlan boards. Same result - if anything it got worse with the other boards. I would like to know if anyone else out there has seen this or a similar problem. If you recognize it and have a nice solution, I'd be interested. If you know something about the Interlan board or the 750 that might explain this, I'd sure love to hear about it. Note that there is also an identical Interlan board in Cephus, but this doesn't seem to matter, it's not causing any problems. (Cephus runs System V Release 2, and TWG still hasn't installed the TCP/IP we ordered, so it's just sitting there not doing anything.) Note also that what mattered was whether the cable was plugged into OSGD, that OSGD is NOT putting out any traffic that would swamp the network (that we know of) - it even happens when OSGD has just been booted single user. Finally, note that I understand that the Interlan board has a known problem that it LISTENS to garbage on the net so it can't receive reliably from certain Ethernet boards that use the Seeq chip, but this appears to be a problem with noise being TRANSMITTED by the board. (We're blaming the board because I don't see how the 750 could be responsible itself.) Our next things to try include using a 3Com transceiver on the Interlan board, and getting our hands on a DEUNA or a 3Com board. (We have an Excelan board we could use if there were any software that we could install without doing a major porting job. I understand BRL has done the port but I don't know enough to know how to get or install it in an existing system.) We would also like to plug in some kind of Ethernet monitor so we could look at what's on the cable; rumor had it that the MIT IBM PC code had such an animal, but the copy we just got from Sparticus doesn't have it (just a ping program.) Any pointers to such a tool (preferably for a PC or a Sun 120) would also be appreciated. Mark
trewitt@Cascade.ARPA (04/18/85)
> We've run up against an interesting property of our Ethernet, and > would like to report it and see if anybody else has seen the same > behavior. : : Well, we've got a fairly short net that has two VAXen (750+interlan, 780+deuna), 1 SUN running the Stanford V-System, 3 SMI UNIX SUNs (1 server, 2 clients), and several Iris workstations. We try to use TCL transceivers, but since many things come with 3Com transceivers we will use them even though they are a pain to install. (Both TCL and 3Com transcievers seem to work fine, though.) However, we have had the following compatibility problems: DEC H4000 transceivers won't work with the SUNs (3Com interfaces). I hear that there is a jumper on the 3Com board that will fix this, but I don't know that I beleive. The transceiver cables that DEC supplies with H4000s/DEUNAs won't work when placed between a SUN and a 3Com transceiver! Please believe this simple, apparently ridiculous statement. I have seen similar problems with Xerox experimental 3Mb installations. It seems to be an impedence mismatch generating reflections/attenuations. (Although I thought that the Ethernet spec gave a figure for this to avoid such problems.) Chalk one up for DEC! When we upgraded one of our SUNs (S/N 60) from 3Mb to 10Mb Ethernet, SUN gave us a little ribbon cable to go from the 3Com board to the back panel. Believe it or not, this sucker was bad! Not solidly broken, not even intermitent -- it exhibited behaviour very similar to that described by Mark Horton -- slow nd booting, nd not responding, then picking up again, etc. We had a spare and it fixed the problem. I don't know what was wrong with the cable and I don't want to. By the time I had taken the guts out of the SUN for the second time to get to this cable I was so mad that I just tied the thing into a bow tie and nailed it to the door. Good hunting! - Glenn Trewitt ...!sun!decwrl!shasta!trewitt trewitt@SU-amadeus.ARPA