[comp.sys.apollo] TCP/IP problems on DN10k, and pascal bug

jal@acc.flint.umich.edu (John Lauro) (01/31/91)

This is really two questions:

Can someone please post or mail me an example program that
demonstrates the pascal bug discussed several months ago.  I
have somehow missplaced my reference for it, and this is the
first semester that pascal will be taught on the Apollos.  The
instructors should be informed...


Has anyone experienced any TCP/IP problems on the DN10k?
ruptime often shows all other apollos up (3500s), but the dn10k
down.  (The 3500s show all hosts up.)  I find it funny that a
host finds itself down more often than up.  Doing /etc/ping from dn10k
to itself has about 3+% packet loss.  Doing /etc/ping from other
machines to themslefs or to the dn10k will not give this same
problem.  (Generally the first packet is timed out, which I suspect
is from doing an arp, but after that it's fine.)  Everything is
at 10.3, and on ethernet.  Also sometimes listing a large file will
freeze up remotely for about 1.5 seconds, and then continues.
It is as if a packet is lost.  This behavior is on both types of
machines, but is much more reproducable on the 10k.  (In fact
cat /etc/services will do it 100% of the time on the 10k).  At first I
thought the window size was too large, and the 10k sent the packets out
faster than the 3500s, but if the 10k can't even reliably ping itself...

Any ideas?  Can anyone reproduce this problem with the 10k?
I would like to know how reproducable it is before I call it into Apollo...

One last note...  I removed the -c from tcpd on all hosts.

Partial cut from /etc/ping:
   . . .
64 bytes from 141.216.7.5: icmp_seq=33. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=34. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=35. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=36. time=4. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=38. time=7. ms
64 bytes from 141.216.7.5: icmp_seq=39. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=40. time=4. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=42. time=132. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=44. time=11. ms
64 bytes from 141.216.7.5: icmp_seq=45. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=46. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=47. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=48. time=6. ms
64 bytes from 141.216.7.5: icmp_seq=49. time=4. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=51. time=8. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=53. time=7. ms
64 bytes from 141.216.7.5: icmp_seq=54. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=55. time=4. ms
Timed out (1 second) waiting for echo reply
64 bytes from 141.216.7.5: icmp_seq=57. time=7. ms
64 bytes from 141.216.7.5: icmp_seq=58. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=59. time=4. ms
64 bytes from 141.216.7.5: icmp_seq=60. time=5. ms
^C
----141.216.7.5 PING Statistics----
61 packets transmitted, 54 packets received, 11% packet loss
round-trip (ms)  min/avg/max = 4/6/132


   - John_Lauro@ub.cc.umich.edu

krowitz@RICHTER.MIT.EDU (David Krowitz) (01/31/91)

Unless you are running NFS between your Apollos, the system
does not use TCP/IP for remote file access -- so the pauses
in listing a file are not due to TCP/IP loosing packets.
The Apollo distributed file system uses DDS (Domain Data Service?)
for the underlying network protocal, and this is completely
separate from the TCP/IP services. You can completely shutdown
your TCP/IP services with an "ifconfig" without affecting the
file system (other than NFS mounted file systems, of course!).


 -- David Krowitz

krowitz@richter.mit.edu   (18.83.0.109)
krowitz%richter.mit.edu@eddie.mit.edu
krowitz%richter.mit.edu@mitvma.bitnet
(in order of decreasing preference)

jal@acc.flint.umich.edu (John Lauro) (02/01/91)

In article <9101311455.AA16072@richter.mit.edu> krowitz@RICHTER.MIT.EDU (David Krowitz) writes:
>Unless you are running NFS between your Apollos, the system
>does not use TCP/IP for remote file access -- so the pauses
>in listing a file are not due to TCP/IP loosing packets.
>The Apollo distributed file system uses DDS (Domain Data Service?)
>for the underlying network protocal, and this is completely
>separate from the TCP/IP services. You can completely shutdown
>your TCP/IP services with an "ifconfig" without affecting the
>file system (other than NFS mounted file systems, of course!).
>
I am not using a different apollo, I am comming in via telnet
from a PC running CUTCP (based on NCSA telnet).  The file is local
to the the machine being telneted to.  Even if I telnet to a 3500 and
cat the same file on the 10000, there is no problem that way either.
Only when you "telnet" to the 10000.    It's not a function of the
file system (I think.)  I haven't tried, but I suspect if I write a
program that just outputs text, it would give the same problem.

   - John