AWalker@RED.RUTGERS.EDU.UUCP (06/07/87)
Did I miss it back whenever, or is there really a bug in VAXCRTL that silently limits the size of a read() to 65535 bytes? I found this the hard way, while porting unix "ispell" to vms -- down there in the dirty stuff called by read(), at _child_open+0b9 there's a "movzwl" that trashes the high half of the "int nbytes" argument you fed it. *Why*?!?! There is no mention of any limitation in the vax C book [but then again, it's the vax C book]. This is Vax-11 C v2.1 [I should probably have a later one, I know] and the image header of vaxcrtl contains the string "VAXCRTL ... V04-005" [if that helps]. I did finally work around the various limitations and have a version of "ispell" that works under vms, if anyone wants it. _H* -------
mcguffey@unc.cs.unc.edu (Michael McGuffey) (06/07/87)
In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU (*Hobbit*) writes: >Did I miss it back whenever, or is there really a bug in VAXCRTL that silently >limits the size of a read() to 65535 bytes? I found this the hard way, while >------- I have recently been having trouble doing reads of around 1/2M at a time I got around it by reading in sections of 1K chunks. Since I only needed the program once, I didn't fine tune itto see the largest read I could perform. I like DEC equipment and software but it seems funny that they would claim in the VAX C ref manual that they want to maintain compatibility with unix c but have such a glaring bug. Does anybody have a patch or fix? --mike mcguffey@dopey.cs.unc.edu mcguffey@csnet
carl@CITHEX.CALTECH.EDU.UUCP (06/07/87)
> Did I miss it back whenever, or is there really a bug in VAXCRTL that > silently limits the size of a read() to 65535 bytes? I found this the hard > way, while porting unix "ispell" to vms -- down there in the dirty stuff > called by read(), at childopen+0b9 there's a "movzwl" that trashes the high > half of the "int nbytes" argument you fed it. *Why*?!?! There is no > mention of any limitation in the vax C book [but then again, it's the vax C > book]. > This is Vax-11 C v2.1 [I should probably have a later one, I know] and the > image header of vaxcrtl contains the string "VAXCRTL ... V04-005" [if that > helps]. > I did finally work around the various limitations and have a version of > "ispell" that works under vms, if anyone wants it. It's not clear to me whether what you're calling a bug is a bug in the C run-time library, a bug in RMS, a bug in the disk ACP or in the device driver for disks, or simply an attempt to insure that all of the above adhere to the definition of the file system they run on, a rather nice idea if you ever plan to move data from your machine to another or to move software from a machine that assumes it will be dealing with FILES-11 ODS-2 disks, rather than some strange combination of FILES-11 and whichever file system happens to be in vogue among UNIX users these days. The following are excerpts from DEC's description of the FILES-11 On-Disk-Structure, Level 2 file system, in particular, their definition of a disk, of a volume, and of a file (the latter is taken from the chapter on RMS). The comments between the excerpts are, of course, mine. +-----------------------------------------------------------------------------+ | ODS-2 defines the largest record supported at the hardware level to be 64K. | | Software that insists on using larger records won't be guaranteed portable. | +-----------------------------------------------------------------------------+ 2.1 Volume The basic medium that carries a Files-11 structure is re- ferred to as a volume. A volume (also often referred to as a unit) is defined as an ordered set of logical blocks. A logical block is an array of 512 8-bit bytes. The logical blocks in a volume are consecutively numbered from 0 to n-1, where the volume contains n logical blocks. The number as- signed to a logical block is called its logical block number, or LBN. Files-11 is capable of describing volumes up to 2**32 blocks in size. In practice, a volume should be at least 100 blocks in size to be useful. The logical blocks of a volume must be randomly addressable. The volume must also allow transfers of any length up to 65K bytes, in multiples of four bytes. When a transfer is longer than 512 bytes, consecutively numbered logical blocks are transferred until the byte count is satisfied. In other words, the volume can be viewed as a partitioned array of bytes. It must allow reads and writes of arrays of any length less than 65K bytes, provided that they start on a logical block boundary and that the length is a multiple of four bytes. When only part of a block is written, the con- tents of the remainder of that logical block will be unde- fined. The logical blocks of a volume are grouped into clusters. The cluster is the basic unit of space allocation on the vo- lume. Each cluster contains one or more logical blocks; the number of blocks in a cluster is known as the volume cluster factor, or storage map cluster factor. A volume is identified as a Files-11 volume by the home block. The home block is located at a defined physical lo- cation on the volume, and is identified by the presence of checksums and predictable values. The home block is des- cribed in detail in section 5.1. To identify the volume, the home block contains a volume label, which is a string of up to 12 ASCII characters. The characters are restricted to the printing ASCII set (i.e., excluding control characters and rubout). Further, it is recommended that volume labels be restricted to alphanumerics only to avoid conflicts with the command languages of supporting systems. The volume label of a volume may not be null. +-----------------------------------------------------------------------------+ | Still not convinced? Well, DEC's definition of a file involves a 64K limit | | on the record size, as indicated below. Larger than that, RMS doesn't like. | +-----------------------------------------------------------------------------+ 3.0 Files Any data in a volume or volume set that is of any interest (i.e., all blocks not available for allocation) is contained in a file. A file is an ordered set of virtual blocks, where a virtual block is an array of 512 8-bit bytes. The virtual blocks of a file are consecutively numbered from 1 to n, where n is the hightes numbered block that has been allocated to the file. The number assigned to a virtual block is called (obviously) its virtual block number, or VBN. Virtual blocks are mapped to unique logical blocks in the volume set by Files-11. Virtual blocks may be processed in the same manner as logical blocks. Any array of bytes less than 65K in length may be read or written, provided that the transfer starts on a virtual block boundary and that its length is a multiple of four. For most files, all VBN's less than or equal to the highest VBN allocated map to some LBN in the volume set. Such files are said to be dense. Files which are sparse contain virtu- al blocks which have not been allocated logical blocks. +-----------------------------------------------------------------------------+ | Well, then maybe you can explain how a record length biffer than 65535 fits | | in a two-byte field? Of course this doesn't apply to STREAM and UDF files. | +-----------------------------------------------------------------------------+ 7.2.3 F.RSIZ 2 Bytes - Record Size In files containing fixed length format records this word contains the size of the records in bytes. In Sequential files containing variable or variable with fixed control formatted records this field contains the size in bytes of the longest record in the file. This field is undefined for Relative and Indexed files containing variable or variable with fixed control format records. +-----------------------------------------------------------------------------+ | As a matter of fact, with variable-record-length files, RMS is unhappy with | | anything over 327567 bytes long. Such records are declared invalid by RMS. | +-----------------------------------------------------------------------------+ 7.2.9 F$MRS 2 Bytes - Maximum Record Size This field contains a user specified maximum re- cord size limit in bytes, to be enforced on output operations. Files containing Fixed length format records have F$MRS set equal to F$RSIZ. For all other record formats F$MRS is set to the user specified value given when the file was created. A value of 0 is interpreted as no maximum record size limit specified.
dp@JASPER.PALLADIAN.COM.UUCP (06/08/87)
Date: 7 Jun 87 15:05:45 GMT From: unc!mcguffey@mcnc.org (Michael McGuffey) In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU (*Hobbit*) writes: >Did I miss it back whenever, or is there really a bug in VAXCRTL that silently >limits the size of a read() to 65535 bytes? I found this the hard way, while >------- I have recently been having trouble doing reads of around 1/2M at a time I got around it by reading in sections of 1K chunks. Since I only needed the program once, I didn't fine tune itto see the largest read I could perform. I like DEC equipment and software but it seems funny that they would claim in the VAX C ref manual that they want to maintain compatibility with unix c but have such a glaring bug. Does anybody have a patch or fix? --mike mcguffey@dopey.cs.unc.edu mcguffey@csnet this is a holdover from pdp-11 days. The controllers could only transfer 64kb, so the restriction carried to the next layer, since not all devices could be supported uniformly. (of course vms was originally envisioned as fitting into 64kb, and the original 11/780 memory controller would only support 1mb, with a 2 controller per system limit. It really was originally intended as a slightly bigger pdp-11, and vms was rsx-11a with some help from the hardware.) I cannot tell you what the business programming language in use in the year 2000 will look like, but it will be called COBOL. Programmers never bury their dead. <dp>
jimp@cognos.uucp (Jim Patterson) (06/09/87)
In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU.UUCP writes: >Did I miss it back whenever, or is there really a bug in VAXCRTL that silently >limits the size of a read() to 65535 bytes? I don't think that it's fair to call this a bug; "limitation" is more appropriate. I'm sure there are UNIX implementations (e.g. ones with 16-bit int's) that have comparable limits. Yes, it should be documented in the manual somewhere (I'm not sure if it is or not). In fact, 65535 is a limitation for RMS records, not just for C. To overcome the limitation the C RTL would have to be prepared to issue multiple RMS gets/puts for a single C read/write, which could be done, but wouldn't be exactly equivalent. I suppose the implementors didn't think that anyone would think of reading more than 65535 bytes with a single read. -- Jim Patterson decvax!utzoo!dciem!nrcaer!cognos!jimp Cognos Incorporated
stevesu@copper.UUCP (06/10/87)
Inevitably, one quickly finds that many C RTL questions cross over from language issues to operating system issues. Several people have correctly pointed out that various parts of VMS have inherent 65,535 or 32,767 byte record limitations. Unfortunately, these arguments have nothing to do with C functions named read() or write(). Most vendors, DEC included, provide a C run-time library which "just happens" to look a lot like Unix. Presumably this is to make porting programs from Unix to (in this case) VMS easy. Therefore, the C RTL should do a reasonable amount of work to hide filesystem or operating system peculiarities, especially those not present in Unix. Unix has _a_b_s_o_l_u_t_e_l_y _n_o _r_e_c_o_r_d _s_t_r_u_c_t_u_r_e. People who are used to traditional record-based operating systems find this concept about as alarming as Westerners do when encountering aborigines running around without clothes, but Unix programmers love this freedom, and having to deal with RMS is one of the biggest shocks when moving from Unix to VMS. VMS I/O is nominally device-independent; Unix I/O is much more so. The read() and write() calls, even though they are low-level I/O routines, should not unnecessarily reflect underlying device characteristics, particularly on disk files. It could be argued that a program that is trying to do reads of more than 32,767 would be more portably written in terms of fopen and fread, so that the stdio package could provide another level of chunking/buffering. It could also be argued that such a program is inherently unportable, because huge numbers like these would not work as the third argument to read() on a 16-bit machine. Dragging record length considerations in, however, begins to compromise the semantics of read(). I must concede that, in the end, it is impossible to ignore record-length considerations when doing C I/O on VMS. Even if huge reads were supported, there are several other alignment problems you have to worry about. (lseeks to record boundaries, reads and writes of multiples of the record size, lines not changing size when updating variable-length record files in place, etc.) There are a whole bunch of these limitations, and they could probably be better documented. In particular, Table A-1 should not state that read() and write() have "equivalent functionality." (Several of the "not equivalent" entries in that table document restrictions much less significant than those of read and write.) Steve Summit stevesu@copper.tek.com P.S. Some of you are saying "but to remove the restrictions, the read() and write() emulations would have to do extra buffering, and that would be inefficient." Yes, buffering underneath read() and write() is necessary if you want to remove as many RMS- related restrictions as possible, but no, it doesn't have to be inefficient. It is possible to write these routines so that, when callers perform the "approved," aligned operations, no buffering (and hence no significant overhead) is required. When unaligned calls are made, the buffering is no less efficient than the buffering that would inevitably be introduced in the calling program to work around a restricted call. P.P.S. I'm knowledgeable about C RTL issues, and could talk about tradeoffs longer than you probably feel like listening and I feel like typing, because I wrote one here at Tektronix, partly because of licensing restrictions with the version 1 DEC C RTL, and partly because we didn't feel like rewriting large numbers of applications to work around the DEC C RTL limitations. Our library is proprietary, of course, so I can't offer you a copy. Disclaimers: Since I don't use DEC's C RTL, for the above- mentioned reason, I can't be sure about its limitations and restrictions. Many of them were removed between versions 1 and 2, including perhaps some of the ones I mentioned in this article. It is not strictly true that Unix has _n_o record structure; when dealing with "raw" devices like terminals and tape drives, the line/record structure becomes apparent, as indeed it must for programs to work correctly when using these devices nonabstractly. Disk files, however, are completely unstructured, unless you count the st_blksize field in the stat structure in 4.2bsd and Ultrix. (If Berkeley hates VMS as much as they would have us believe, why do they keep putting so many VMSisms in 4bsd?) Oh, and I hope this doesn't trigger another "Unix vs. VMS" debate. I use 'em both; I'm not even gonna mention which one I prefer :-).