[net.periphs] Interlan boards and evil waves

mark@cbosgd.UUCP (Mark Horton) (04/13/85)

We've run up against an interesting property of our Ethernet, and
would like to report it and see if anybody else has seen the same
behavior.

Our Ethernet looks like this (roughly):
	terminator
	(50 feet)
	cborion (diskless Sun 120 with 3Com board and 3Com xcvr)
	(150 feet)
	cbcephus (VAX 785 with Interlan board, no software installed yet)
	cbtac (Bridge CS/100 with modified Interlan xcvr)
	(150 feet)
	cbosgd (VAX 750 with Interlan board, Interlan xcvr)
	(5 feet)
	cbhydra (Sun 170 with 3Com board, Interlan xcvr)
	(50 feet)
	cbpavo (Sun 120 with local disk, 3Com board, 3Com xcvr)
	(200 feet)
	terminator
We've been having a severe noise problem between hydra and orion,
the Sun net disk protocol will report that the net disk isn't responding,
wait several seconds, report that it's OK again, then immediately report
that it's not responding again.  This varies wildly - sometimes it's so
severe you can't get any work done, other times it's fine.  We've run a
TDR (it shows the cable is fine) and replaced all the taps, xcvr cables,
and so forth.  We even tried plugging pavo into orion's slot, and pavo has
the same problem.  Orion works fine in pavo's slot.  Pavo does have
problems with the net disk going away, but they are brief and rare,
(3 error messages per day worse case) and don't affect use of the machine.
The one thing we had not replaced was the Ethernet cable itself, so we
suspected it.  But you know what ia pain is to run 300 feet or more of
cable, and we didn't have the foresight to run a second cable, so we
were reluctant to substitute a cable.

Well, yesterday we got fed up and dragged out the spool of cable.  After
confirming that the xcvr, xcvr cable, and Sun 120 could all be swapped
and that the problem just occurred at that location on the Ethernet, we
ran a second cable from cborion to about 50 feet short of cbosgd (bypassing
cephus and the tac.)  We were amazed when it showed exactly the same problem,
especially since the cable just ran down the hall, not up in the ceiling.

The problem was pretty severe yesterday - orion was in a catatonic state,
we couldn't even get a response from the shell.  When booting it, the
problem reproduced very consistently - we would get 2 to 5 ?'s during
the boot sequence (on top of the -'s and ='s while the netdisk booted.)

So we started to simplify things, and first we unplugged the xcvr cable
from cbosgd.  The problem magically went away.  Putting back the real cable
and plugging/unplugging osgd's xcvr cable repeatedly confirmed that this
was 100% correlated with the problem.  OSGD was somehow putting evil waves
out onto the net that kept Orion (which is physically the furthest away)
from reliably talking to Hydra.  (Our cable person insists that he's had
OSGD unplugged before and the problem remained, I can only report what we
observed yesterday, and speculate that there may be other factors here
that don't meet the eye.)

We swapped the transceivers between Hydra and OSGD - same result.  We changed
the xcvr cable, the little 10 foot board-to-xcvr cable, and swapped between
3 different Interlan boards.  Same result - if anything it got worse with
the other boards.

I would like to know if anyone else out there has seen this or a similar
problem.  If you recognize it and have a nice solution, I'd be interested.
If you know something about the Interlan board or the 750 that might
explain this, I'd sure love to hear about it.  Note that there is also
an identical Interlan board in Cephus, but this doesn't seem to matter,
it's not causing any problems.  (Cephus runs System V Release 2, and TWG
still hasn't installed the TCP/IP we ordered, so it's just sitting there
not doing anything.)  Note also that what mattered was whether the cable
was plugged into OSGD, that OSGD is NOT putting out any traffic that would
swamp the network (that we know of) - it even happens when OSGD has just
been booted single user.  Finally, note that I understand that the
Interlan board has a known problem that it LISTENS to garbage on the net
so it can't receive reliably from certain Ethernet boards that use the
Seeq chip, but this appears to be a problem with noise being TRANSMITTED
by the board.  (We're blaming the board because I don't see how the 750
could be responsible itself.)

Our next things to try include using a 3Com transceiver on the Interlan
board, and getting our hands on a DEUNA or a 3Com board.  (We have an
Excelan board we could use if there were any software that we could
install without doing a major porting job.  I understand BRL has done
the port but I don't know enough to know how to get or install it in
an existing system.)  We would also like to plug
in some kind of Ethernet monitor so we could look at what's on the cable;
rumor had it that the MIT IBM PC code had such an animal, but the copy
we just got from Sparticus doesn't have it (just a ping program.)  Any
pointers to such a tool (preferably for a PC or a Sun 120) would also
be appreciated.

	Mark

trewitt@Cascade.ARPA (04/18/85)

> We've run up against an interesting property of our Ethernet, and
> would like to report it and see if anybody else has seen the same
> behavior.
	:
	:

Well, we've got a fairly short net that has two VAXen (750+interlan,
780+deuna), 1 SUN running the Stanford V-System, 3 SMI UNIX SUNs (1 server,
2 clients), and several Iris workstations.  We try to use TCL transceivers,
but since many things come with 3Com transceivers we will use them even
though they are a pain to install.  (Both TCL and 3Com transcievers seem to
work fine, though.)

However, we have had the following compatibility problems:

DEC H4000 transceivers won't work with the SUNs (3Com interfaces).  I hear
that there is a jumper on the 3Com board that will fix this, but I don't
know that I beleive.

The transceiver cables that DEC supplies with H4000s/DEUNAs won't work when
placed between a SUN and a 3Com transceiver!  Please believe this simple,
apparently ridiculous statement.  I have seen similar problems with Xerox
experimental 3Mb installations.  It seems to be an impedence mismatch
generating reflections/attenuations.  (Although I thought that the Ethernet
spec gave a figure for this to avoid such problems.)  Chalk one up for DEC!

When we upgraded one of our SUNs (S/N 60) from 3Mb to 10Mb Ethernet, SUN
gave us a little ribbon cable to go from the 3Com board to the back panel.
Believe it or not, this sucker was bad!  Not solidly broken, not even
intermitent -- it exhibited behaviour very similar to that described by Mark
Horton -- slow nd booting, nd not responding, then picking up again, etc.
We had a spare and it fixed the problem.  I don't know what was wrong with
the cable and I don't want to.  By the time I had taken the guts out of the
SUN for the second time to get to this cable I was so mad that I just tied
the thing into a bow tie and nailed it to the door.

Good hunting!
	- Glenn Trewitt

...!sun!decwrl!shasta!trewitt
trewitt@SU-amadeus.ARPA