[net.unix-wizards] Help: 4.2bsd IPC routines/TCP sequence error?

sloane@marlin.UUCP (03/14/85)

OVERVIEW:
I am working on a series of programs that communicate via TCP over
a 10mb ethernet. The hosts are all Vaxen (750s and 780s) all running
4.2bsd. The application is written in C and uses the calls outlined
in the 4.2bsd Interprocess Communications Primer. Our problem is that
when we send records from one host to another, the order in which they
are received is not the same as the order they were sent. The problem
seems to be in the timing between repeated calls to send().

SPECIFICS:
Here is a sample of the code used to send and recieve info over the TCP
socket: the lines marked /* CROWBAR */ in SEND are necessary to make it work. 
We think this is because of a timing problem in the send and/or recv loops.
The socket is opened AF_INET/SOCK_STREAM. There is NO select() call made
as there is only one socket open. The TCP address is sent from the host
requesting the transfer via UDP (it works great).

The sample code opens a file on one host and uses the send() call to 
transfer it over a TCP socket to another host. The file is transferred 
line-by-line. The file consists of 64 records with sequence numbers from
0-63.

We know the data is not received in order because there is a test in the 
RECV code to make sure that the received sequence number is the same as 
the loop counter. Without the CROWBAR code these fragments transfer 
portions of the file (like maybe the 1st 14 records) and then trap at the 
"out of sequence" error in RECV. The amount of data transferred varies 
from try to try... With the CROWBAR code the entire transfer works
fine (all 64 records are transferred IN ORDER).

----------------------   SEND    -------------------------------------
/* open memo group file */
fd = fopen("memogrp.fil","r");

/* read memogroup records and send via TCP */
for(cnt=0; cnt<NGRPS; cnt++)
   {
   /* read a single group record */
   if(fgets(bufr, 66, fd) == NULL)
      {
      shutdown(tcp_socket);
      user_err(USR, 0, pgm, "bad data or premature EOF in readgrps()");
      }

   /* send it via TCP */
   if(send(tcp_socket, bufr, strlen(bufr)+1, 0) < 0)
      {
      shutdown(tcp_socket);
      user_err(SYS, errno, pgm, "send failure");
      }

   /* if these lines are commented out it FAILS!!! */
   fprintf(null, "%s", bufr); /* CROWBAR */
   fflush(null); /* CROWBAR */
   }

fclose(null); /* CROWBAR */

/* close memo group file */
fclose(fd);;
}

--------------------------------  RECV  -------------------------------
/* open the memogroup file */
mfd = fopen("memogrp.fil", "w");

/* receive the memogroup file from the memo daemon via TCP */
for(cnt=0; cnt<NGRPS; cnt++)
   {
   /* recieve a group record and load the structure */
   if(recv(tcp_socket, bufr, 68 , 0) < 0)
      {
      shutdown(tcp_socket);
      user_err(SYS, errno, pgm, "recv: call failed");
      }

   /* make sure receipt is in sequence */
   extract(recvd_seq,bufr,1);
   if(stoi(recvd_seq) != cnt)
      {
      shutdown(tcp_socket);
      user_err(USR, 0, pgm, "recv: memogroup received out of sequence");
      }
   /* write the record to the memogroup file */
   fprintf(mfd, "%s\n", bufr);
   }

/* close the memogroup file */
fclose(mfd);
}
-----------------------------------------------------------------------

    * **************************************************************** *
    * Gary K. Sloane/Computer Sciences Corporation/San Diego, CA 92110 *
    * DDD: (619) 225-8401                                              *
    * MILNET: sloane@nosc.ARPA                                         *
    * UUCP: [ihnp4,akgua,decvax,dcdwest,ucbvax]!sdcsvax!noscvax!sloane *
    * **************************************************************** *

chris@umcp-cs.UUCP (Chris Torek) (03/17/85)

Try reading a complete line (rather than anywhere from 1 to 68 bytes,
whichever is available first) in your receiver.  "recv(...)" on a TCP
socket is like read() from a pipe or file: it doesn't *have* to return
a complete "record" since there aren't any records at that level.

In other words, if you want to read exactly 68 bytes from a TCP socket,
loop, read()- or recv()ing, until you've got 68 bytes (or an error).
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	{seismo,allegra,brl-bmd}!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

dillon@ucbvax.ARPA (The Sherif "Matt D.") (03/21/85)

> OVERVIEW:
> I am working on a series of programs that communicate via TCP over
> a 10mb ethernet. The hosts are all Vaxen (750s and 780s) all running
> 4.2bsd. The application is written in C and uses the calls outlined
> in the 4.2bsd Interprocess Communications Primer. Our problem is that
> when we send records from one host to another, the order in which they
> are received is not the same as the order they were sent. The problem
> seems to be in the timing between repeated calls to send().

Your problem is NOT with the socket.  STREAM sockets are GUARENTEED not
to duplicate or mess the order of any data.  Therefore, you do NOT have
to number your packets.  

The problem is in the recieving end of the program.  If the transmission side
of the program can get across more than one send before the recieving end
of the program does it's recv, the recieving end recv will get BOTH sends all
in one buffer... or all of one send and part of another.  STREAM sockets do
not preserve BLOCK BOUNDRIES as UDP sockets do.  

The only reason the code seemed to work when you had the printf's was because
those printf's caused a delay large enough enabling the recieving side to
read in the data before the sending side could send another packet.

So, to recap, the reciver side is at fault because you assume block boundries
stay intact when in fact they don't.  One recv may read several sends worth
of info.   Also, you do not need to number your packets if you are using
a STREAM socket.

I hope I've helped out,
				
				Matthew Dillon