LEICHTER-JERRY@YALE.ARPA (06/19/87)
About two weeks ago, someone - I now forget who, and there seems little point in dredging through the records to figure it out - complained that the VAX C RTL was "broken" because its implementation of read() could not return more than 65535 bytes. It happens that Gnu Emacs depends on this, so was breaking on VMS. The offended someone, clearly a Unix patriot, flamed about the "undocumented incompatibilities" in VAX C, apparently convinced that he had found yet another proof that Unix was the "only true path". The immediate response came from a number of members of VMS Lovers Anonymous, explaining that the QIO mechanism was limited to limited to 65535 bytes, hence read() - "naturally" - had to be, too. The flames only grew from there on. What got completely ignored in all the verbiage was: 1. What is read() DEFINED TO DO IN UNIX? How about in the VAX C documentation? 2. What was the original writer really trying to accomplish with his very large read()? 1. Definition Here's the definition of read() from the SVID: int read(fildes,buf,nbyte) int fildes; char *buf; unsigned nbyte; The function read attempts to read nbyte buffers from the file associated with fildes into the buffer pointed to by buf. ... If successful, the function read will return the number of bytes read and placed in the buffer; this number may be less then nbyte if the file is associated with a communications line [see IOCTL(BA_OS) and TERMIO(BA_ENV)], or if the number of bytes left in the file is is less then nbytes, or if the file is a pipe or a special file. When the end of file has been reached, the function read will return 0. The Berkeley definition is slightly different: cc = read(d,buf,nbytes) int cc, d; char *buf; int nbytes; /* Not unsigned! */ Read attempts to read nbytes of data from the object referenced by d into the buffer pointed to by buf. ... Upon successful completion, read ... return[s] the number of bytes actually read and placed in the buffer. The system guarantees to read the number of bytes requested if the descriptor references a file which has that many bytes left before the end-of-file, but in no other cases. If the returned value is 0, then end-of-file has been reached. (If anyone has access to a POSIX definition of read, it would be interesting to see it for comparison.) There are some interesting problems with these definitions - for example, the SVID definition takes nbyte unsigned but returns a SIGNED integer; what if nbyte specifies a value too large to fit in a signed integer, and that many bytes are actually read? - but the overall import is clear: For non-special disk files, read is supposed to return as many bytes as are asked for, assu- ming they are actually available in the file. So for such files, the VAX C implementation is not completely consistent with Unix practice. However, the VAX C documentation never guarantees that VAX C is 100% compati- ble with Unix implementations (which, in fact, are not even exactly consistent with each other, as the two extracts above make clear): It only claims that the VAX C software will be compatible with the VAX C documentation. The VAX C definition of read is: int read(file_desc, buffer, nbytes) int file_desc, nbytes; char *buffer; ... The function returns the number of bytes actually read. The return value does not necessarily equal nbytes. For example, if the input is from a terminal, at most one line of characters is read. NOTE In general, the read function will not span record bounda- ries in a record file. A separate read must be done for each record. Sounds like documentation of the restriction to me, at least for record files. At best, one could argue that the "NOTE" should also discuss the limit of 65535 bytes for non-record files. 2. Usage However, do you really want to write code that will work correctly ONLY for disk files? Device independence is nice to have, and rather cheap in this case. There is no reason not to be able to edit stuff directly off of a tape or passing through a pipe. In neither of these cases are you guaranteed to receive "all there is". Conversely, if you are willing to be device DEPENDENT, huge reads are NOT the right approach on VMS: In practice, you would effectively be copying the file from its home disk to your paging disk. A MUCH better approach, when you can get away with it - which you can with no extra work for STREAM files - is to map the file into memory as a disk section, then let it page in and out as required. The VMS-specific code required to do this is pretty simple - under a hundred lines of code should do it - and the improvement in performance - especially startup - can be quite dramatic. -- Jerry -------