[comp.protocols.tcp-ip.ibmpc] Hacking CMU sources to TurboC

bkc@OMNIGATE.CLARKSON.EDU (Brad Clements) (05/28/88)

Hi,

I'm converting the CMU sources to TurboC 1.5. If anyone has already done this
and has it working, please tell me so I can stop banging my head against the wall.
Otherwise, those of you who have messed around with the source would you please
think about the following problem and perhaps offer a suggested solution.

Facts:
	Interface NI5010
	TurboC version of Netwatch works fine.

	Trying to get ping to work, ping -s enters server mode, accepts
	ICMP ping requests from host A. Ping looks up the ip address
	in its tables (which are empty) and since it doesn't find the 
	IP address it sends out an ARP REQ broadcast. Host A sees the 
	ARP REQ broadcast and sends an ARP REP which ping NEVER sees.

	Meanwhile, ping does see lotsa ARP REQs from other hosts, none directed
	at it so those are not placed in its tables.

	Other broadcasts are seen, such as IP/UDP broadcasts, and dumped
	by ping.

	Now, here's the crazy part, I removed the ARP table entry from host A
	for the ping PC. I placed ping in server mode, then tried to
	ping it from host A. That worked. Apparently ping gets
	the ARP REQ sent by host A and saves host A's ethernet/ip address
	since it will probably need to reply to it in the future. 


Since ICMP echo requests are not broadcasts, the interlan driver is receiving 
some packets, but not ARP REP. 
Except that netwatch correctly shows ARP REQs and REPs but I think thats because 
the NI5010 is set to receive ALL packets, broadcast or otherwise.

Can anyone offer an idea as to where to look for the missing packets.
Ping does not report:
	a. packet too short
	b. unkown packet type (either ping or Arp types)

On a quiet subnet, I set ping to single shot ping host A. Using LanWatch (thanks FTP!),
there was exactly one ARP REQ sent by the ping PC and one ARP REP returned. 
However the ping program exited and stated:

1 packet sent
0 packets received.

This is driving me crazy, if anyone has any ideas I'd appreciate them.

Thanks,
Brad Clements
Network Engineer
Clarkson University

jbvb@VAX.FTP.COM (James Van Bokkelen) (05/28/88)

I see several possible areas where the problem could be:

1. The transmit ISR might not be re-enabling the board fast enough to see the
ARP reply.  Presumably Turbo's fault, but I can't guess why.

2. The ARP module might be broken in a way which prevents it from handling
incoming replies in general.  Presumably also Turbo's fault.

3. You may be misunderstanding the architecture of PCIP: Does host A send a
second Echo Request packet?  If so, it might work, where the first failed.
This is because the EtDemux task is the context that gets blocked waiting for
the ARP reply (when it upcalls ICMP, and ICMP down-calls in_write()).  Since
EtDemux is blocked, nobody processes the ARP reply (look carefully at the
counts bumped by the ISR, not by EtDemux or indemux while on that quiet
network, you may see the packet you want actually arrived).  in_write()
times out waiting for the ARP that never gets processed, the Echo Reply
never gets sent.  The ARP and ICMP structure of PCIP still has some
weaknesses for use as a server, even after the work Drew added to the
MIT version.

James VanBokkelen
FTP Software Inc.

gruber@bgsu.EDU.UUCP (05/31/88)

bkc@omnigate.clarkson.edu mentioned a problem with getting his turbo c port
of CMU's pcip ping to work in server mode unless the host's arp cache entry
was purged.

We are having a similar problem with the IBM pcip code. The problem goes
away, whether the pc is set in server mode or client mode ping, when we
delete the arp entry from the host, a 4.3 BSD machine.

We thought that it might be a timing problem, but when we examined the traffic
it didn't appear that the host was sending a ARP response if there was an
entry in its arp cache for the requester. We also deleted one 4.3 bsd machine
entry from the other's cache, leaving the other 4.3 machine's entry in the
other. The 4.3 machines wouldn't talk to each other until the other cache
entry was purged too. We looked at the 4.3 source and it looks to us like
this would be the expected behaviour.

How sure are you that the host really is sending a ARP response?

It looks to me like the 4.3 ARP code trailer negotiation stuff might be the
reason that the 4.3 stuff is funny. The new TCP/IP source recently posted
to Usenet by Berkeley doesn't look like it would have the same awkward
behaviour, but maybe I'm wrong.

We don't have these problems when talking to an Ultrix computer.

I hope this helps.

Can anyone shed any light on this? Is there any good reason that a host
shouldn't send an ARP response if it has any entry for the requestor in its
cache?

John Gruber gruber%andy.bgsu.edu@relay.cs.net     tut!bgsuvax!gruber

ROMKEY@XX.LCS.MIT.EDU (John Romkey) (06/04/88)

The ping problems that people have been talking about are probably an old
problem due to the way PC/IP works. If you ping a PC/IP program multiple times
from one host, the first ping shouldn't get a reply, but the others should.

Here's why:

Suppose you're pinging an IBM PC-type machine (P) running PC/IP from something
else, X. If X doesn't have an entry in its ARP cache for P then X ARP's P and
P responds. When P responds, it adds an entry to its ARP cache for X (on the
reasoning that if someone ARP's you then you're pretty likely to need to
send packets to them soon). In this case, everything should be fine.

[Actually...I can't remember if the MIT/CMU PC/IP cached IP/ethernet address
pairs when it got responses or if I added that at FTP Software. I think that
got put in back at MIT.]

The more interesting case is when X doesn't ARP P (or, more correctly, when
P doesn't already know X's ethernet address).

X sends an ICMP echo request to P. In detail, P takes an interrupt from the
ethernet interface, copies the packet into the PC's memory, queues it and
makes the ethernet demultiplexing task runnable. Eventually P actually
runs the ethernet demultiplexing task, which upcalls IP (indemux()) passing
it the received packet. IP then decides it's an ICMP packet and then calls
ICMP. ICMP decides it's an echo request, which needs an echo reply sent back
to the source of this packet.

At this point, ICMP is still running on the ethernet demultiplexer task's
stack. It formats up an echo reply and passes it to in_write() to transmit
it. Here's where you lose. If X isn't in P's ARP cache, P transmit an ARP
request and most likely gets back an ARP reply. P's interrupt handler
copies the ARP reply in and queues it up and makes the ethernet demultiplexer
task runnable. Eventually it runs again, BUT when it does, it's still in
ICMP. ICMP says - "Oh? Did we get a response? No... Did we time out? No...
Okay, let's wait some more." and the ARP reply doesn't get processed.
Eventually ARP times out, ICMP gives up and the ethernet demultiplexer task
finishes demultiplexing the original ICMP echo request and gets to process
more received packets, finally getting to handle the ARP reply. It's too
late for this ICMP echo reply, now, but the ARP reply still gets entered
into the ARP cache, so if we do this again everything should work okay.

Now, this behaviour is a little weird, but it's a fairly straightforward
consequence of the way PC/IP is structured. The easiest way around it would
be to have ICMP create a new task to send the echo reply back, but task
creation in PC/IP is kind of expensive, so we don't do that.

It's also not so out-of-line with the way the ARP RFC says ARP should work,
either. The ARP RFC says that when you're transmitting an IP packet and
you need to send an ARP request because of it, you should send the request
and drop the IP packet you're sending. This greatly simplifies the output
side of IP and the ethernet layer. It also would lead to the behavior that
you're seeing with PC/IP, but for different reasons.

In fact, PC/IP doesn't obey the ARP RFC in this area; ARP holds on to the
packet that's being transmitted and waits a while for the ARP reply to come in.
I did it this way because the whole ARP cache in PC/IP is very transient -
it gets cleared everytime you run a program (since there's a copy of it in
every program). That meant that the first packet any program sent was
guaranteed to be discarded, which seemed like a waste of time.

That, in great detail, is probably why you're losing.
				- john
-------