[net.unix-wizards] SUN-3s talking to SUN-2s with 3COM boards

menges@unc.UUCP (John Menges) (05/06/86)

Here at the University of North Carolina'a Department of Computer Science,
we have approximately 40 SUN-2 workstations.  We are in the process of
installing a number of SUN-3 workstations as well.  Some of our SUN-2s have
3COM ethernet controllers.  SUN has informed us that there is a communications
problem interfacing SUN-2s with 3COM boards and SUN-3s.  The problem has
something to do with the 3COM boards not having enough buffering to handle
full-speed transmissions from the SUN-3.  The information we get from our
local SUN sales representative, however, is confusing and incomplete.
I am hoping that someone on the net will be able to clarify this issue.

According to our sales representative, file systems on a SUN-3 cannot
be remote (NFS) mounted on a SUN-2 with at 3COM board, and vice-versa,
without "slowing down the SUN-3 ethernet controller".  Rlogin, rcp, rsh, etc.,
however, are supposed to work without slowing down the controller.
According to some SUN documentation we have, SUN-2s can also be clients
of a SUN-3 file server (using ND), if the SUN-3 is told (in /etc/nd.local)
to limit the number of packets sent to the client (on a per-client basis)
to two before requiring an acknowledgement.

Now for the questions that haven't been answered:

  1.  Why do rlogin, etc. work but not NFS?  Does the NFS protocol not
      use any form of flow control or packet re-transmit?  If that is the
      case, what happens when you run NFS between a VAX or another faster
      machine and a SUN-2 with a 3COM board?  

  2.  What does it mean to "slow down the ethernet board"?  Is it slowed
      down regardless of who it's talking to (e.g., is SUN-3 to SUN-3
      communication slowed down), or on a per-host basis?

I'd appreciate any light that anyone can shed on this subject.  Thanks
in advance!
                                John Menges
                                menges@unc (csnet)
                                decvax!mcnc!unc!menges (uucp)

steve@umcp-cs.UUCP (Steve D. Miller) (05/07/86)

In article <1428@unc.unc.UUCP> menges@unc.UUCP (John Menges) writes:
>Here at the University of North Carolina'a Department of Computer Science,
>we have approximately 40 SUN-2 workstations.  We are in the process of
>installing a number of SUN-3 workstations as well.  Some of our SUN-2s have
>3COM ethernet controllers.  SUN has informed us that there is a
>communications problem interfacing SUN-2s with 3COM boards and SUN-3s.
>The problem has something to do with the 3COM boards not having enough
>buffering to handle full-speed transmissions from the SUN-3.

   I can readily believe that; people around here are of the opinion that
the 3COM boards are inherently slow, while (a) the other Sun Ethernet
board (ie) is apparently fast to begin with and (b) if code complexity is
any indication (the driver is > 2000 lines, and pulls all sorts of strange
memory tricks), has a driver that fully supports its speed.  The ec (3COM)
driver is trivial in comparison.

>According to our sales representative, file systems on a SUN-3 cannot
>be remote (NFS) mounted on a SUN-2 with at 3COM board, and vice-versa,
>without "slowing down the SUN-3 ethernet controller".  Rlogin, rcp, rsh,
>etc., however, are supposed to work without slowing down the controller.
>According to some SUN documentation we have, SUN-2s can also be clients
>of a SUN-3 file server (using ND), if the SUN-3 is told (in /etc/nd.local)
>to limit the number of packets sent to the client (on a per-client basis)
>to two before requiring an acknowledgement.
>
>Now for the questions that haven't been answered:
>
>  1.  Why do rlogin, etc. work but not NFS?  Does the NFS protocol not
>      use any form of flow control or packet re-transmit?  If that is the
>      case, what happens when you run NFS between a VAX or another faster
>      machine and a SUN-2 with a 3COM board?  

   I'm not sure that there's any reason why everything (rlogin, rsh, ...  ,
NFS, ND) shouldn't work.  It will probably not work too well, though, as the
3COM board will end up dropping lots of packets, so the ie boards will have
to do a lot of retransmits...maybe enough to time out an occasional
connection, though I doubt it.  NFS and ND all work off datagram protocols;
ND is an "unofficial" protocol on top of IP, while NFS is built on a
UDP-based RPC "connection".  The NFS call routines all do retransmits via an
exponential backoff scheme, but hard-mounted NFS filesystems will continue
to retry the transmission indefinitely.  It should be noted that the backoff
happens on a per- packet basis only, so the next packet to go out will be
sent with the minimum timeout.  Of course, all those retransmits will be
more work for the server...

   One potential problem that I just thought of is that reads (and,
perhaps, writes; I haven't looked at that part of the code) occur in
4K chunks, fragmented and reassembled as appropriate by IP; this means
that the 3COM board is (based on a MTU of ~1500 bytes) going to
see three to four big packets come in in *rapid* succession.  If
ND can only handle two without acks of some sort, then I'd be willing
to guess that part of almost every NFS read will get dropped on the
floor.

   I've only been looking at the NFS code for a relatively brief time; does
someone out there from Sun (or otherwise more in the know than I am) care to
comment?  I certainly wouldn't want to buy a lot of machines without more
information from an "official" source.

>  2.  What does it mean to "slow down the ethernet board"?  Is it slowed
>      down regardless of who it's talking to (e.g., is SUN-3 to SUN-3
>      communication slowed down), or on a per-host basis?

   Cthulhu knows what they mean by this, unless they're talking about
being slowed down by excessive retransmits.

	-Steve
-- 
Spoken: Steve Miller 	ARPA:	steve@mimsy.umd.edu	Phone: +1-301-454-4251
CSNet:	steve@umcp-cs 	UUCP:	{seismo,allegra}!umcp-cs!steve
USPS: Computer Science Dept., University of Maryland, College Park, MD 20742

chris@umcp-cs.UUCP (Chris Torek) (05/07/86)

In article <1359@umcp-cs.UUCP> steve@maryland.UUCP (Steve D. Miller) writes:
>In article <1428@unc.unc.UUCP> menges@unc.UUCP (John Menges) writes:
>>  2.  What does it mean to "slow down the ethernet board"? ...
>
>   Cthulhu knows what they mean by this ....

Actually, I suspect they mean something along these lines:

	/*
	 * Start transmission on an ie.
	 */
	ieoutput(sc)
		struct ie_softc *sc;
	{
		...
	#ifdef UGLY_KLUDGE
		if (sc->sc_flags & SF_NEEDDELAY) {
			sc->sc_flags &= ~SF_NEEDDELAY;
			timeout(ieoutput, (caddr_t) sc, 1);
			return;
		}
	#endif
		...
		ie->ie_command_register = IE_DO_A_SEND;
	#ifdef UGLY_KLUDGE
		sc->sc_flags |= SF_NEEDDELAY;
	#endif
	}

This would introduce a two tick delay per packet, which gives a
maximum transmission rate of 25 packets per second (ugh).  It might
work to do timeout(..., 0), giving 50 packets/sec; but that is
still awful.

Another alternative, if you do not mind wasting CPU, is

		ie->ie_command_register = IE_DO_A_SEND;
	#ifdef OTHER_UGLY_KLUDGE
		DELAY(1000);	/* ~1 ms, hope that is long enough */
	#endif

I used something like the latter to get around a microcode bug in
UDA50s (though I no longer need to get around it: I now simply
avoid the situation in which the bug shows up).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

hedrick@topaz.RUTGERS.EDU (Charles Hedrick) (05/08/86)

NFS sends blocks which are either 4Kbytes or 8Kbytes (depending upon
the function).  At a lower level, these are are turned into packets
(1.5Kbytes if you are using normal Ethernet parameters).  All of the
packets generated from a given block are queued up to the output at
the same time.  The result is a burst of between 3 and 6 packets with
almost no time between them.  This code bypasses much of the normal
TCP/IP code, for efficiency.  The 3Com boards have only two buffers,
and they are on the board.  In order to deal with large bursts, Unix
must copy one buffer into mbufs while the other one is being filled
from the network, and it must finish this process before the next
packet shows up.  A standalone 68010 with nothing to do but empty 3Com
buffers, having zero interrupt latency, might just be able to do this.
But a 68010 running Unix certainly cannot.  The result is that at
least one of the packets in the burst is dropped.  Because of the
design of NFS, acknowlegements and retransmissions occur on the basis
of the 4K or 8K blocks, not the individual packets.  So if any one
packet is dropped, the whole burst is lost and must be retransmitted.
Thus you must receive every packet in a burst correctly.

The solution is not exactly to slow down the Ethernet controller.
Rather, under version 3.0 there is a parameter you can specify in the
mount that gives a maximum block size.  You simply limit NFS to 2K
blocks.  Then its bursts are never longer than 2 packets. This
increases CPU overhead slightly, because certain processing must be
done once per block, and you are now sending twice as many blocks.  It
also decreases throughput slightly.  It's not clear that this is
really a big deal.  This could be considered "slowing down the
controller", but it is probably better described as "detuning NFS".
Note that this is done for each mount.  So only mounts between
3Com Sun 2's and Sun 3's need to have this parameter.  Everything
else on both machines will run as usual.

The problem does not afflict normal TCP use because the TCP code in
the kernel isn't nearly as fast.  It generates packets one at a time,
rather than in bursts.

guy@sun.UUCP (05/09/86)

> This code bypasses much of the normal TCP/IP code, for efficiency. ...
> The problem does not afflict normal TCP use because the TCP code in
> the kernel isn't nearly as fast.  It generates packets one at a time,
> rather than in bursts.

A clarification: it bypasses all of the TCP code, because NFS uses Sun RPC
with UDP, not TCP, as its transport mechanism.  (It does bypass much of the
UDP and IP code, as well.)

> Rather, under version 3.0 there is a parameter you can specify in the
> mount that gives a maximum block size.  You simply limit NFS to 2K
> blocks.

Note that this is also useful if you are mounting a file system from a
machine which is many gateways away from you.  The IP datagram containing
the UDP datagram containing NFS replies gets fragmented, as was pointed out
in the previous message; this causes problems sending the NFS reply along a
path involving several gateways.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.arpa

jel@portal.UUcp (John Little) (05/09/86)

In article <1359@umcp-cs.UUCP>, steve@umcp-cs.UUCP (Steve D. Miller) writes:
> 
>    I can readily believe that; people around here are of the opinion that
> the 3COM boards are inherently slow, while (a) the other Sun Ethernet
> board (ie) is apparently fast to begin with and (b) if code complexity is
> any indication (the driver is > 2000 lines, and pulls all sorts of strange
> memory tricks), has a driver that fully supports its speed.  The ec (3COM)
> driver is trivial in comparison.

The Intel 586 chip that most SUNs use to do Ethernet is not the
world's most reliable, bug free, well documented or well supported chip.
The last time I saw the bug list for the 586 it was five
pages long.  I suspect that much of the complexity is due to SUN's
working around various bogosities in the chip.

There are lots of good reasons why SUN's most recent machine (the 3/50)
uses the AMD 7990 instead of the Intel equivalent.

Note: I have no official connection with SUN, Intel or AMD.

John Little
{atari,sun,hoptaod}!portal!jel