[comp.os.vms] read

LEICHTER-JERRY@YALE.ARPA (06/19/87)
About two weeks ago, someone - I now forget who, and there seems little point
in dredging through the records to figure it out - complained that the VAX C
RTL was "broken" because its implementation of read() could not return more
than 65535 bytes.  It happens that Gnu Emacs depends on this, so was breaking
on VMS.  The offended someone, clearly a Unix patriot, flamed about the
"undocumented incompatibilities" in VAX C, apparently convinced that he had
found yet another proof that Unix was the "only true path".

The immediate response came from a number of members of VMS Lovers Anonymous,
explaining that the QIO mechanism was limited to limited to 65535 bytes, hence
read() - "naturally" - had to be, too.  The flames only grew from there on.

What got completely ignored in all the verbiage was:

	1.  What is read() DEFINED TO DO IN UNIX?  How about in the VAX C
		documentation?

	2.  What was the original writer really trying to accomplish with
		his very large read()?

1.  Definition

Here's the definition of read() from the SVID:

	int read(fildes,buf,nbyte)
	int fildes;
	char *buf;
	unsigned nbyte;

	The function read attempts to read nbyte buffers from the file
	associated with fildes into the buffer pointed to by buf.

	...

	If successful, the function read will return the number of bytes
	read and placed in the buffer; this number may be less then nbyte
	if the file is associated with a communications line [see IOCTL(BA_OS)
	and TERMIO(BA_ENV)], or if the number of bytes left in the file is
	is less then nbytes, or if the file is a pipe or a special file.  When
	the end of file has been reached, the function read will return 0.

The Berkeley definition is slightly different:

	cc = read(d,buf,nbytes)
	int cc, d;
	char *buf;
	int nbytes;			/* Not unsigned!	*/

	Read attempts to read nbytes of data from the object referenced by
	d into the buffer pointed to by buf.

	...

	Upon successful completion, read ... return[s] the number of bytes
	actually read and placed in the buffer.  The system guarantees to read
	the number of bytes requested if the descriptor references a file
	which has that many bytes left before the end-of-file, but in no
	other cases.

	If the returned value is 0, then end-of-file has been reached.

(If anyone has access to a POSIX definition of read, it would be interesting
to see it for comparison.)

There are some interesting problems with these definitions - for example, the
SVID definition takes nbyte unsigned but returns a SIGNED integer; what if
nbyte specifies a value too large to fit in a signed integer, and that many
bytes are actually read? - but the overall import is clear:  For non-special
disk files, read is supposed to return as many bytes as are asked for, assu-
ming they are actually available in the file.  So for such files, the VAX C
implementation is not completely consistent with Unix practice.

However, the VAX C documentation never guarantees that VAX C is 100% compati-
ble with Unix implementations (which, in fact, are not even exactly consistent
with each other, as the two extracts above make clear):  It only claims that
the VAX C software will be compatible with the VAX C documentation.  The VAX C
definition of read is:

	int read(file_desc, buffer, nbytes)
	int file_desc, nbytes;
	char *buffer;

	...

	The function returns the number of bytes actually read.  The return
	value does not necessarily equal nbytes.  For example, if the input
	is from a terminal, at most one line of characters is read.

					NOTE

		In general, the read function will not span record bounda-
		ries in a record file.  A separate read must be done for
		each record.

Sounds like documentation of the restriction to me, at least for record files.
At best, one could argue that the "NOTE" should also discuss the limit of
65535 bytes for non-record files.

2.  Usage

However, do you really want to write code that will work correctly ONLY for
disk files?  Device independence is nice to have, and rather cheap in this
case.  There is no reason not to be able to edit stuff directly off of a tape
or passing through a pipe.  In neither of these cases are you guaranteed to
receive "all there is".

Conversely, if you are willing to be device DEPENDENT, huge reads are NOT the
right approach on VMS:  In practice, you would effectively be copying the file
from its home disk to your paging disk.  A MUCH better approach, when you can
get away with it - which you can with no extra work for STREAM files - is to
map the file into memory as a disk section, then let it page in and out as
required.  The VMS-specific code required to do this is pretty simple - under
a hundred lines of code should do it - and the improvement in performance -
especially startup - can be quite dramatic.
							-- Jerry
-------