bkc@OMNIGATE.CLARKSON.EDU (Brad Clements) (05/28/88)
Hi, I'm converting the CMU sources to TurboC 1.5. If anyone has already done this and has it working, please tell me so I can stop banging my head against the wall. Otherwise, those of you who have messed around with the source would you please think about the following problem and perhaps offer a suggested solution. Facts: Interface NI5010 TurboC version of Netwatch works fine. Trying to get ping to work, ping -s enters server mode, accepts ICMP ping requests from host A. Ping looks up the ip address in its tables (which are empty) and since it doesn't find the IP address it sends out an ARP REQ broadcast. Host A sees the ARP REQ broadcast and sends an ARP REP which ping NEVER sees. Meanwhile, ping does see lotsa ARP REQs from other hosts, none directed at it so those are not placed in its tables. Other broadcasts are seen, such as IP/UDP broadcasts, and dumped by ping. Now, here's the crazy part, I removed the ARP table entry from host A for the ping PC. I placed ping in server mode, then tried to ping it from host A. That worked. Apparently ping gets the ARP REQ sent by host A and saves host A's ethernet/ip address since it will probably need to reply to it in the future. Since ICMP echo requests are not broadcasts, the interlan driver is receiving some packets, but not ARP REP. Except that netwatch correctly shows ARP REQs and REPs but I think thats because the NI5010 is set to receive ALL packets, broadcast or otherwise. Can anyone offer an idea as to where to look for the missing packets. Ping does not report: a. packet too short b. unkown packet type (either ping or Arp types) On a quiet subnet, I set ping to single shot ping host A. Using LanWatch (thanks FTP!), there was exactly one ARP REQ sent by the ping PC and one ARP REP returned. However the ping program exited and stated: 1 packet sent 0 packets received. This is driving me crazy, if anyone has any ideas I'd appreciate them. Thanks, Brad Clements Network Engineer Clarkson University
jbvb@VAX.FTP.COM (James Van Bokkelen) (05/28/88)
I see several possible areas where the problem could be: 1. The transmit ISR might not be re-enabling the board fast enough to see the ARP reply. Presumably Turbo's fault, but I can't guess why. 2. The ARP module might be broken in a way which prevents it from handling incoming replies in general. Presumably also Turbo's fault. 3. You may be misunderstanding the architecture of PCIP: Does host A send a second Echo Request packet? If so, it might work, where the first failed. This is because the EtDemux task is the context that gets blocked waiting for the ARP reply (when it upcalls ICMP, and ICMP down-calls in_write()). Since EtDemux is blocked, nobody processes the ARP reply (look carefully at the counts bumped by the ISR, not by EtDemux or indemux while on that quiet network, you may see the packet you want actually arrived). in_write() times out waiting for the ARP that never gets processed, the Echo Reply never gets sent. The ARP and ICMP structure of PCIP still has some weaknesses for use as a server, even after the work Drew added to the MIT version. James VanBokkelen FTP Software Inc.
gruber@bgsu.EDU.UUCP (05/31/88)
bkc@omnigate.clarkson.edu mentioned a problem with getting his turbo c port of CMU's pcip ping to work in server mode unless the host's arp cache entry was purged. We are having a similar problem with the IBM pcip code. The problem goes away, whether the pc is set in server mode or client mode ping, when we delete the arp entry from the host, a 4.3 BSD machine. We thought that it might be a timing problem, but when we examined the traffic it didn't appear that the host was sending a ARP response if there was an entry in its arp cache for the requester. We also deleted one 4.3 bsd machine entry from the other's cache, leaving the other 4.3 machine's entry in the other. The 4.3 machines wouldn't talk to each other until the other cache entry was purged too. We looked at the 4.3 source and it looks to us like this would be the expected behaviour. How sure are you that the host really is sending a ARP response? It looks to me like the 4.3 ARP code trailer negotiation stuff might be the reason that the 4.3 stuff is funny. The new TCP/IP source recently posted to Usenet by Berkeley doesn't look like it would have the same awkward behaviour, but maybe I'm wrong. We don't have these problems when talking to an Ultrix computer. I hope this helps. Can anyone shed any light on this? Is there any good reason that a host shouldn't send an ARP response if it has any entry for the requestor in its cache? John Gruber gruber%andy.bgsu.edu@relay.cs.net tut!bgsuvax!gruber
ROMKEY@XX.LCS.MIT.EDU (John Romkey) (06/04/88)
The ping problems that people have been talking about are probably an old problem due to the way PC/IP works. If you ping a PC/IP program multiple times from one host, the first ping shouldn't get a reply, but the others should. Here's why: Suppose you're pinging an IBM PC-type machine (P) running PC/IP from something else, X. If X doesn't have an entry in its ARP cache for P then X ARP's P and P responds. When P responds, it adds an entry to its ARP cache for X (on the reasoning that if someone ARP's you then you're pretty likely to need to send packets to them soon). In this case, everything should be fine. [Actually...I can't remember if the MIT/CMU PC/IP cached IP/ethernet address pairs when it got responses or if I added that at FTP Software. I think that got put in back at MIT.] The more interesting case is when X doesn't ARP P (or, more correctly, when P doesn't already know X's ethernet address). X sends an ICMP echo request to P. In detail, P takes an interrupt from the ethernet interface, copies the packet into the PC's memory, queues it and makes the ethernet demultiplexing task runnable. Eventually P actually runs the ethernet demultiplexing task, which upcalls IP (indemux()) passing it the received packet. IP then decides it's an ICMP packet and then calls ICMP. ICMP decides it's an echo request, which needs an echo reply sent back to the source of this packet. At this point, ICMP is still running on the ethernet demultiplexer task's stack. It formats up an echo reply and passes it to in_write() to transmit it. Here's where you lose. If X isn't in P's ARP cache, P transmit an ARP request and most likely gets back an ARP reply. P's interrupt handler copies the ARP reply in and queues it up and makes the ethernet demultiplexer task runnable. Eventually it runs again, BUT when it does, it's still in ICMP. ICMP says - "Oh? Did we get a response? No... Did we time out? No... Okay, let's wait some more." and the ARP reply doesn't get processed. Eventually ARP times out, ICMP gives up and the ethernet demultiplexer task finishes demultiplexing the original ICMP echo request and gets to process more received packets, finally getting to handle the ARP reply. It's too late for this ICMP echo reply, now, but the ARP reply still gets entered into the ARP cache, so if we do this again everything should work okay. Now, this behaviour is a little weird, but it's a fairly straightforward consequence of the way PC/IP is structured. The easiest way around it would be to have ICMP create a new task to send the echo reply back, but task creation in PC/IP is kind of expensive, so we don't do that. It's also not so out-of-line with the way the ARP RFC says ARP should work, either. The ARP RFC says that when you're transmitting an IP packet and you need to send an ARP request because of it, you should send the request and drop the IP packet you're sending. This greatly simplifies the output side of IP and the ethernet layer. It also would lead to the behavior that you're seeing with PC/IP, but for different reasons. In fact, PC/IP doesn't obey the ARP RFC in this area; ARP holds on to the packet that's being transmitted and waits a while for the ARP reply to come in. I did it this way because the whole ARP cache in PC/IP is very transient - it gets cleared everytime you run a program (since there's a copy of it in every program). That meant that the first packet any program sent was guaranteed to be discarded, which seemed like a waste of time. That, in great detail, is probably why you're losing. - john -------