[comp.dcom.lans] DECnet problem/protocol question

jrosen@vms.macc.wisc.edu (Jay Rosenbloom) (07/24/90)

I'm trying to help someone copy a file between DECnet systems using the VMS
COPY command and we are getting the message "-RMS-E-CRC, network DAP level CRC
check failed"  when RMS closes the target file .  The source machine is a
microvax I running VMS 4.5.  We have tried several target machines with no 
success.  The one target that seemed most likely to succeed (but didn't) was
also a VMS 4.5 machine on the same cable segment (in the same room) as the
source machine.   We also tried different files.  Very small files (less than a
couple hundred  blocks) worked.  I diff'ed versions of a file where all the
blocks copied but failed with the above error message and found multi-bit
differences in some records so I believe that DAPs CRC check is working (i.e.
the data really is getting corrupted).

One thing that is a little surprising to me is that DAP is detecting  the error
and not a lower layer protocol (would it be NSP?).  I think I read somewhere
that NSP was originally designed to run on top of a reliable  data link layer
(DDCMP), perhaps obviating the need in NSP to check for data errors between
nodes on the same line.  

Can anyone explain why DAP appears to be detecting the errors and not an
underlying transport layer?  It seems like the transport layer is not detecting
bad packets and doing retransmissions.  Maybe it does, but it's just not
effective.

I'd really appreciate it if a DECnet guru could shed some light on 
this!

-Jay
.............................................................................
Jay Rosenbloom / 608-262-9421 / jrosen@macc.wisc.edu / jrosen@wiscmacc      
Univ. of Wisconsin/Madison, Madison Academic Computing Center, Network Support

oberman@rogue.llnl.gov (07/24/90)

In article <4084@dogie.macc.wisc.edu>, jrosen@vms.macc.wisc.edu (Jay Rosenbloom) writes:
> I'm trying to help someone copy a file between DECnet systems using the VMS
> COPY command and we are getting the message "-RMS-E-CRC, network DAP level CRC
> check failed"  when RMS closes the target file .  The source machine is a
> microvax I running VMS 4.5.  We have tried several target machines with no 
> success.  The one target that seemed most likely to succeed (but didn't) was
> also a VMS 4.5 machine on the same cable segment (in the same room) as the
> source machine.   
 
This has been a long time, so memory is a bit faded, but there was a bug in VMS
V4.something that caused these errors. DEC put out a patch for it in either
V4.6 or V4.7. So it could be this bug that is biting you.

The data IS corrupted, but by software, not hardware. (Or maybe you have some
sort of bus error that bypasses the data layer checks.)

					R. Kevin Oberman
					Lawrence Livermore National Laboratory
					Internet: oberman@icdc.llnl.gov
   					(415) 422-6955

Disclaimer: Don't take this too seriously. I just like to improve my typing
and probably don't really know anything useful about anything.

bell@aussie.dec.com (Peter Bell) (07/24/90)

In article <4084@dogie.macc.wisc.edu>, jrosen@vms.macc.wisc.edu (Jay Rosenbloom) writes...
>I'm trying to help someone copy a file between DECnet systems using the VMS
>COPY command and we are getting the message "-RMS-E-CRC, network DAP level CRC
>check failed"  when RMS closes the target file .  The source machine is a
[...]
>differences in some records so I believe that DAPs CRC check is working (i.e.
>the data really is getting corrupted).
[...]
>Can anyone explain why DAP appears to be detecting the errors and not an
>underlying transport layer?  It seems like the transport layer is not detecting
>bad packets and doing retransmissions.  Maybe it does, but it's just not
>effective.

The DAP CRC check is designed to detect any errors that occur at any point in
the copy. It will (most of the time) detect any memory/bus/adapter errors which
are otherwise unchecked on any machine through which the data passes.

As you are seeing errors on the same ethernet segment, then I would suspect
your hardware has a fault which is corrupting the data outside of the lower
level error checks.

>I'd really appreciate it if a DECnet guru could shed some light on 
>this!
>-Jay

Peter.  {DECnet guru?}

cac@hpctdlr.HP.COM (Chris Clabaugh) (08/02/90)

I can try and help.  Chances are that the packets that are doing the DAP file
transfer are made up of Ethernet, the Routing protocool, NSP, and finally DAP.
Ethernet does a checksum of the whole packet but has no knowledge of whether
the data from the file is really bad or not.  Therefore Ethernet won't
detect the problem.  DRP (the Routing protocol) will be using the Long Data
Message and it does not do a checksum at all.  Other DRP message types do
(like the Level 1 and Level 2 Routing messages).  NSP also does not do a
checksum.  Therefore both of these protocols won't detect the problem either.
Finally, that brings us to DAP which does calculate its own checksum of the
data.

DAP works by sending the whole file across the network and then finally the
the checksum in a separate packet.  If the checksum is bad, then this won't
be discovered until the whole file has already been transferred.

NSP was originally designed to work on early phases of DECnet.  There was no
LAN in these early phases and therefore you are right, it ran on DDCMP as well
as some others.  In Phase IV (the current phase for now, Phase V is soon),
LANS and X.25 were added along with other enhancements.  NSP is actually a
fairly robust protocol.  It has 'port' numbers and sequence numbers.  It also
segments packets and does flow control.  It does, however, lack a checksum.

I hope this helps...

Chris Clabaugh
Hewlett-Packard Colorado Telecommunications Division
cac@hpctdkg.hp.com