[comp.unix.wizards] buffer i/o using read

ables@lot.ACA.MCC.COM (King Ables) (03/06/90)

I've been using read(2) to read data from a socket and am having
problems when the buffers get large.  I'm hoping someone has run
into this before and knows what I am doing wrong.

I want to read an arbitrarily large block of data and not have to worry
about message boundaries (hence read/write rather than send/recv).

I use write(2) to send it to the connected socket, and get back a
return code indicating all characters were "written" to the socket.

I've been using the following in my readmsg function:

int	hd;		/* host descriptor	*/
char	*buf;		/* ptr to buffer	*/
int	n;		/* size of input buffer	*/

	count = read(hd, buf, n);

hd is the socket.  buf is my data buffer.  n is max number of characters
I want to read.  count will return number of characters read.

This works fine on a socket connected between two processes on my Sun up
to a buffer size of 4096.  Anything after that doesn't seem to make it.
I get (for example) 8000 back from write(hd, buf, n) where n=8000, but
when I read(hd, buf, 8000) I get 4096 and a successive read(2) returns -1.

On sockets connected between two different machines, the successful buffer
size varies between 512 and 2048, but acts the same way.  I've fired up an
ethernet monitor, and most of the data is appearing in packets, but not all
of it even seems to get sent.  However, more is showing up in packets (getting
sent) than what winds up being read, anyway.

So I'm guessing there's a timing problem.  I tried setting SO_LINGER with
setsockopt(2) on both ends, but that didn't seem to have any effect.

I decided to try doing reads like I've seen in some source code, one character
at a time.  When I changed my code to do (approximately) this:

	while (read(hd, &c, 1) == 1) buf[count++] = c;

Then it gets all 8000 characters!

For several reasons, I'd really like to do it the first way.  Namely I
would rather not have to look for some trailing key character to know when
to stop reading (or have to specify a length at the front end).

Can anybody tell me what I'm missing about buffered reads that's causing
me grief?  Or is the answer just do single character reads?

Thanks.

King Ables                    Micro Electronics and Computer Technology Corp.
ables@mcc.com                 3500 W. Balcones Center Drive
+1 512 338 3749               Austin, TX  78759

libes@cme.nist.gov (Don Libes) (03/06/90)

In article <637@lot.ACA.MCC.COM> ables@lot.ACA.MCC.COM (King Ables) writes:
>I want to read an arbitrarily large block of data and not have to worry
>about message boundaries (hence read/write rather than send/recv).
>
>This works fine on a socket connected between two processes on my Sun up
>to a buffer size of 4096.  Anything after that doesn't seem to make it.
>I get (for example) 8000 back from write(hd, buf, n) where n=8000, but
>when I read(hd, buf, 8000) I get 4096 and a successive read(2) returns -1.

Well, what's the value of errno?  Note that expecting to be able to
read arbitrarily-large packets is unrealistic due to limited kernel
buffering and protocol design.

I know you don't want to encapsulate your I/O, but that is the only
solution.  I have some code that does it which you can anonymously ftp
from durer.cme.nist.gov as pub/sized_io.shar.Z

I also wrote a paper describing some of these problems.  It is
"Packet-Oriented Communications Using a Stream Protocol --or-- Making
TCP/IP on Berkeley UNIX a Little More Pleasant to Use", NISTIR
90-4232, January 1990" and is available from:

Mary Lou Fahey
NIST
Bldg 220, Rm A-127
Gaithersburg, MD  20899
fahey@cme.nist.gov

Don Libes          libes@cme.nist.gov      ...!uunet!cme-durer!libes

jnixon@andrew.ATL.GE.COM (John F Nixon) (03/07/90)

ables@lot.ACA.MCC.COM (King Ables) writes:
> I've been using read(2) to read data from a socket and am having
> problems when the buffers get large...  I want to read an arbitrarily large 
> block of data and not have to worry about message boundaries (hence 
> read/write rather than send/recv).

Sorry, but if you are using AF_INET SOCK_STREAM sockets, and it sounds like
you are from your problem description, read/write will not preserve record
boundaries.  From "Introductory 4.3BSD IPC"

    ... Stream communication implies serveral things.  ... as in pipes,
    no record boundaries are kept.  Reading from a stream may result in
    reading the data send from one or several calls to write() or only
    part of the data from a single call, if there was not enough room for
    the entire message, or if not all data from a large message has been
    transfered.

So, if you want reliability, you have to manage record boundaries.  If you
want record boundaries, you use SOCK_DGRAM and give up reliability.  I have
not used any other types of sockets... yet.

>I decided to try doing reads like I've seen in some source code, one character
>at a time.  When I changed my code to do (approximately) this:
>	while (read(hd, &c, 1) == 1) buf[count++] = c;
>Then it gets all 8000 characters!

This tells me that you are seeing problems due to the lack of record boundaries
(or stating it another way, not all of your write arrives in one read).

You don't have to do it character at a time!  You do have to include a 
record size to keep yourself straight.  Once you know the record size, you
can ask for all of the data, accept what you get, and ask for the rest.
Repeat till everything is there.

    while ( recordsize > sizehere ) {
        bytes = read (soc, buf + sizehere, recordsize - sizehere);

	/* error handling, make sure bytes is +ve */
        sizehere += bytes;
    }

The above fragment is more or less it.  You can do all error handling
in one place by making the read call a call to your own routine which then
calls read.  You can worry about blocking.  But at least you will get
all of your data.

--
----
jnixon@atl.ge.com                    ...steinmetz!atl.decnet!jnxion

aperez@cvbnet.UUCP (Arturo Perez x6739) (03/08/90)

From article <637@lot.ACA.MCC.COM>, by ables@lot.ACA.MCC.COM (King Ables):
> I've been using read(2) to read data from a socket and am having
> problems when the buffers get large.  I'm hoping someone has run
> into this before and knows what I am doing wrong.
> 
> I want to read an arbitrarily large block of data and not have to worry
> about message boundaries (hence read/write rather than send/recv).
> 
> 
> King Ables                    Micro Electronics and Computer Technology Corp.
> ables@mcc.com                 3500 W. Balcones Center Drive
> +1 512 338 3749               Austin, TX  78759

This is one of my pet peeves about BSD sockets.  There is no way
to read an arbitrary amount of data from a socket.  You have to be
aware of the kernel level buffering NO MATTER WHAT LEVEL your writing
your code at; i.e. apps, system, etc.

Why can't the kernel block your process until you get all the data you're
asking for (unless, of course, FIONBIO or O_NDELAY is set)?  If I'm
willing to wait, I'm willing to wait.  And if the connection goes down during
the transfer, I can live with that, too, just return an error.

Why was such a silly decision made?

Arturo Perez
ComputerVision, a division of Prime
aperez@cvbnet.prime.com
Too much information, like a bullet through my brain -- The Police

ka@cs.washington.edu (Kenneth Almquist) (03/11/90)

aperez@cvbnet.UUCP (Arturo Perez x6739) writes:
> From article <637@lot.ACA.MCC.COM>, by ables@lot.ACA.MCC.COM (King Ables):
>> I've been using read(2) to read data from a socket and am having
>> problems when the buffers get large.  I'm hoping someone has run
>> into this before and knows what I am doing wrong.
>>
>> I want to read an arbitrarily large block of data and not have to worry
>> about message boundaries (hence read/write rather than send/recv).
>
> This is one of my pet peeves about BSD sockets.  There is no way
> to read an arbitrary amount of data from a socket.  You have to be
> aware of the kernel level buffering NO MATTER WHAT LEVEL your writing
> your code at; i.e. apps, system, etc.
>
> Why can't the kernel block your process until you get all the data you're
> asking for (unless, of course, FIONBIO or O_NDELAY is set)?  If I'm
> willing to wait, I'm willing to wait.  And if the connection goes down during
> the transfer, I can live with that, too, just return an error.
>
> Why was such a silly decision made?

I presume that the idea of having the read system call return a short
count originally appeared in UNIX to deal with terminal input.  When a
program issues a read system call on a terminal, the read call will
return as soon as a line of input is available, even if the number of
characters in the line is smaller than the size of the buffer passed
to read.  If UNIX did not work this way, most interactive programs
would have to issue a separate read system call for each character,
and stop issuing system calls when a newline character was read.  This
would be inefficient.

When a program issues a read system call on a pipe, the read call will
return as soon as data is available, even if the number of characters
available is smaller than the size of the buffer passed to read.  If
UNIX did not work this way, bc (which opens a pipe to dc) would not
work unless dc inefficiently issued a separate read system call for
every character.

Berkeley sockets intentionally copied the pipe semantics, so that
pipes could be implemented as a special case of sockets.  And the
semantics of Berkeley sockets can be justified independently of this.
If a read on a socket worked the way that King suggests, then rlogin
would have to issue a separate read system call for every character
received from the remote host, which would be very inefficient.

If you have to read a specific number of characters under UNIX, there
are two ways to do it.  One is to place a loop around the read system
call.  The other is to use the fread routine and let the standard I/O
library take care of the buffering.
					Kenneth Almquist

antony@lbl-csam.arpa (Antony A. Courtney) (03/12/90)

In article <11057@june.cs.washington.edu> ka@cs.washington.edu (Kenneth Almquist) writes:
>aperez@cvbnet.UUCP (Arturo Perez x6739) writes:
>> From article <637@lot.ACA.MCC.COM>, by ables@lot.ACA.MCC.COM (King Ables):
||| I've been using read(2) to read data from a socket and am having
||| problems when the buffers get large.  I'm hoping someone has run
||| into this before and knows what I am doing wrong.
|||
||| I want to read an arbitrarily large block of data and not have to worry
||| about message boundaries (hence read/write rather than send/recv).
||
|| This is one of my pet peeves about BSD sockets.  There is no way
|| to read an arbitrary amount of data from a socket.  You have to be
|| [...]
|| Why was such a silly decision made?
|
|I presume that the idea of having the read system call return a short
|count originally appeared in UNIX to deal with terminal input.
|[...]
|If UNIX did not work this way, [ lots of stuff would break because of lots
| reasons]...


hmmmm.  The thought comes to mind:  Why not just add an ioctl() that allows
the user-level application to mark the socket for CTRAN i/o (Complete
TRANsaction), and when so marked the socket will only return when the number
of characters asked for on the read() call is into the buffer?

		antony

--
*******************************************************************************
Antony A. Courtney				antony@lbl.gov
Advanced Development Group			ucbvax!lbl-csam.arpa!antony
Lawrence Berkeley Laboratory			AACourtney@lbl.gov

mike@turing.cs.unm.edu (Michael I. Bushnell) (03/17/90)

In article <85@cvbnetPrime.COM> aperez@cvbnet.UUCP (Arturo Perez x6739) writes:

>This is one of my pet peeves about BSD sockets.  There is no way
>to read an arbitrary amount of data from a socket.  You have to be
>aware of the kernel level buffering NO MATTER WHAT LEVEL your writing
>your code at; i.e. apps, system, etc.

>Why can't the kernel block your process until you get all the data you're
>asking for (unless, of course, FIONBIO or O_NDELAY is set)?  If I'm
>willing to wait, I'm willing to wait.  And if the connection goes down during
>the transfer, I can live with that, too, just return an error.

>Why was such a silly decision made?

Nothing new.  The same is true of terminal I/O.  

All you need is:

int
myread(des, buf, buflen)
     int des, buflen
     char *buf;
{
  char *bp = buf;
  int nread = 0, nbytes;

  while (nread != buflen)
    {
      nbytes = read(des, bp, buflen - nread);
      if (nbytes == -1)
        return -1;		/* Or whatever else you want */
      bp += nbytes, nread += nbytes;
    }
}

This will solve your problem quite nicely.  Any questions?  Now you
*don't* need to know about the low-level buffering.  




--
    Michael I. Bushnell      \     This above all; to thine own self be true
LIBERTE, EGALITE, FRATERNITE  \    And it must follow, as the night the day,
   mike@unmvax.cs.unm.edu     /\   Thou canst not be false to any man.
        CARPE DIEM           /  \  Farewell:  my blessing season this in thee!

ables@lot.ACA.MCC.COM (King Ables) (03/17/90)

From article <MIKE.90Mar16113811@turing.cs.unm.edu>,
  by mike@turing.cs.unm.edu (Michael I. Bushnell):
> In article <85@cvbnetPrime.COM> aperez@cvbnet.UUCP
> (Arturo Perez x6739) writes:
> 
>>This is one of my pet peeves about BSD sockets.
>
> Nothing new.  The same is true of terminal I/O.  

But we're not talking about terminal i/o.  And anyway, "it's always
been that way" is no justification for the way something works (or
doesn't as the case may be).

> All you need is:
> [some code to take care of knowing how much data you got with
>  the read() so you can keep read()ing until you get it all.]

Sure, it works.  But Arturo's whole point is that you shouldn't
*need* to do that!  Everyone who has ever had this problem has
had to re-invent the same wheel.  Okay, it's not a complicated 
wheel, granted.  And I understand why when terminal i/o gets 
involved and you're trying to write generic code to work on all
kinds of i/o streams that you want to make lowest common denominator
assumptions about capability.  But making everyone in the world write
their own version of read() to act like the real thing for their
sockets isn't the answer, either.

Don Libes' little library of socket i/o calls (that he referenced in
a message here when this thread started) does a nice job of "fixing"
the problem... too bad a few calls like this weren't in some BSD library
in the first place.

-king

aperez@cvbnet.UUCP (Arturo Perez x6739) (03/20/90)

It seems that I have generated a little bit of heat (but also quite a bit of
light) with my statement that the buffering on BSD sockets is visible even
at the application's level.  I have even been accused of telling "a lie, 
excuse me, a misleading statement" in a public forum.  So now I feel I must
clarify what I meant.

You may or may not recall that I claimed that the buffering on a BSD socket
is visible to applications and sometimes even users.  For example, here's
an excerpt from a Sun 3/60 man page for tar(1):


B	Force tar to perform multiple reads (if necessary)
	so as to read exactly enough bytes to fill a block.  This
	option exists so that tar can work across the Ethernet,
	since pipes and sockets return partial blocks even
	when more data is coming.

That's my best piece of evidence.  Now, you and I may know that it's not 
strictly necessary to have this option, but there it is.

Arturo Perez
ComputerVision, a division of Prime
aperez@cvbnet.prime.com
Too much information, like a bullet through my brain -- The Police

smb@ulysses.att.com (Steven M. Bellovin) (03/21/90)

In article <129@cvbnetPrime.COM>, aperez@cvbnet.UUCP (Arturo Perez x6739) writes:
> You may or may not recall that I claimed that the buffering on a BSD socket
> is visible to applications and sometimes even users.  For example, here's
> an excerpt from a Sun 3/60 man page for tar(1):

> B	Force tar to perform multiple reads (if necessary)
> 	so as to read exactly enough bytes to fill a block.  This
> 	option exists so that tar can work across the Ethernet,
> 	since pipes and sockets return partial blocks even
> 	when more data is coming.

> That's my best piece of evidence.  Now, you and I may know that it's not 
> strictly necessary to have this option, but there it is.

Without addressing your original claim, Berkeley had no choice on this
one.  The relevant factor is the TCP spec -- TCP has no concept of
records, and does not preserve record boundaries.  Thus, if BSD was
to implement TCP -- which was the purpose of the DARPA grant that
funded much of its development -- and if they were to support tar
across a TCP connection -- and tar format antendates 4.2bsd by several
years -- they had to do something at the application level.  Any
other possible implementation meeting those two constraints would
have similar properties.

amoss@batata.huji.ac.il (amos shapira) (03/24/90)

ka@cs.washington.edu (Kenneth Almquist) writes:

>I presume that the idea of having the read system call return a short
>count originally appeared in UNIX to deal with terminal input.

This isn't the only possible reason, what about reading to the end of a
file? When a process tryes to read more bytes than avilable in the file then
the same event would apear (i.e. the amount of bytes returned is less than
the amounts requested).

[ stuff deleted ]

>Berkeley sockets intentionally copied the pipe semantics, so that
>pipes could be implemented as a special case of sockets.

I doubt this, the Berkeley sockets were devised solely for the purpose of
letting processes talk to the network devices (that's why they were financed
by DARPA) and they have nothing to do with the way data is transfered through
them. ONE derivation of the socket mechanisem is the socketpair, which was
copyed also into the implementaion of the pipe() system call. The read()
and write() system calls were changed to support sockets mainly to let
"naive" processes use sockets without knowing about them, this also seems
to me as a good example of sticking to the UNIX dicipline that processes
shouldn't care too much were their input/output comes-from/goes-to.
For more info read "The Design and Implementation of the 4.3BSD UNIX(tm)
Operating System" by Laffler, McKusick, Karels and Quarterman, chapters 10
to 12, note that the description of the sockets mechanism is completly
separate from the other realated layers.

>If you have to read a specific number of characters under UNIX, there
>are two ways to do it.  One is to place a loop around the read system
>call.  The other is to use the fread routine and let the standard I/O
>library take care of the buffering.
>     Kenneth Almquist

There is an ioctl() functions called FIONREAD which will return the number
of immidietly avilable bytes to read (if you read the book mentioned above,
then this is the so_rcv.sb_cc field). Note that this is implemented in the
socket layer and not in the protocols layer (sys/sys_socket.c line 75 in
4.3BSD).

Cheers,
- Amos Shapira

amoss@batata.bitnet
amoss@batata.huji.ac.il