[comp.os.vms] C RTL again?

AWalker@RED.RUTGERS.EDU.UUCP (06/07/87)

Did I miss it back whenever, or is there really a bug in VAXCRTL that silently
limits the size of a read() to 65535 bytes?  I found this the hard way, while
porting unix "ispell" to vms -- down there in the dirty stuff called by
read(), at _child_open+0b9 there's a "movzwl" that trashes the high half of
the "int nbytes" argument you fed it.  *Why*?!?!  There is no mention of
any limitation in the vax C book [but then again, it's the vax C book].

This is Vax-11 C v2.1 [I should probably have a later one, I know] and the
image header of vaxcrtl contains the string "VAXCRTL ... V04-005" [if that
helps].

I did finally work around the various limitations and have a version of "ispell"
that works under vms, if anyone wants it.

_H*
-------

mcguffey@unc.cs.unc.edu (Michael McGuffey) (06/07/87)

In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU (*Hobbit*) writes:
>Did I miss it back whenever, or is there really a bug in VAXCRTL that silently
>limits the size of a read() to 65535 bytes?  I found this the hard way, while
>-------

I have recently been having trouble doing reads of around 1/2M at a time
I got around it by reading in sections of 1K chunks.  Since
I only needed the program once, I didn't fine tune itto see the
largest read I could perform.  I like DEC equipment and software but it seems
funny that they would claim in the VAX C ref manual that
they want to maintain compatibility with unix c but
have such a glaring bug.

Does anybody have a patch or fix?

--mike
mcguffey@dopey.cs.unc.edu
mcguffey@csnet

carl@CITHEX.CALTECH.EDU.UUCP (06/07/87)

 > Did I miss it back whenever, or is there  really  a  bug  in  VAXCRTL  that
 > silently limits the size of a read() to 65535 bytes?  I found this the hard
 > way, while porting unix "ispell" to vms -- down there in  the  dirty  stuff
 > called by read(), at childopen+0b9 there's a "movzwl" that trashes the high
 > half of the "int nbytes" argument you  fed  it.   *Why*?!?!   There  is  no
 > mention of any limitation in the vax C book [but then again, it's the vax C
 > book].

 > This is Vax-11 C v2.1 [I should probably have a later one, I know] and  the
 > image header of vaxcrtl contains the string "VAXCRTL ...  V04-005" [if that
 > helps].

 > I did finally work around the various limitations and  have  a  version  of
 > "ispell" that works under vms, if anyone wants it.

It's not clear to me whether what you're calling a bug  is  a  bug  in  the  C
run-time  library, a bug in RMS, a bug in the disk ACP or in the device driver
for disks, or simply an attempt to insure that all of the above adhere to  the
definition of the file system they run on, a rather nice idea if you ever plan
to move data from your machine to another or to move software from  a  machine
that  assumes  it  will be dealing with FILES-11 ODS-2 disks, rather than some
strange combination of FILES-11 and whichever file system  happens  to  be  in
vogue  among  UNIX  users  these  days.  The following are excerpts from DEC's
description of  the  FILES-11  On-Disk-Structure,  Level  2  file  system,  in
particular, their definition of a disk, of a volume, and of a file (the latter
is taken from the chapter on RMS).  The comments between the excerpts are,  of
course, mine.

 +-----------------------------------------------------------------------------+
 | ODS-2 defines the largest record supported at the hardware level to be 64K. |
 | Software that insists on using larger records won't be guaranteed portable. | 
 +-----------------------------------------------------------------------------+
         2.1  Volume

          The basic medium that carries a Files-11  structure  is  re-
          ferred  to as a volume.  A volume (also often referred to as
          a unit) is defined as an ordered set of logical  blocks.   A
          logical  block  is an array of 512 8-bit bytes.  The logical
          blocks in a volume are consecutively numbered from 0 to n-1,
          where  the volume contains n logical blocks.  The number as-
          signed to a  logical  block  is  called  its  logical  block
          number,  or  LBN.  Files-11 is capable of describing volumes
          up to 2**32 blocks in size.  In practice, a volume should be
          at least 100 blocks in size to be useful.

          The logical blocks of a volume must be randomly addressable.
          The volume must also allow transfers of any length up to 65K
          bytes, in multiples of  four  bytes.   When  a  transfer  is
          longer than 512 bytes, consecutively numbered logical blocks
          are transferred until the byte count is satisfied.  In other
          words,  the  volume  can be viewed as a partitioned array of
          bytes.  It must allow reads and  writes  of  arrays  of  any
          length  less  than  65K bytes, provided that they start on a
          logical block boundary and that the length is a multiple  of
          four  bytes.  When only part of a block is written, the con-
          tents of the remainder of that logical block will  be  unde-
          fined.

          The logical blocks of a volume are  grouped  into  clusters.
          The cluster is the basic unit of space allocation on the vo-
          lume.  Each cluster contains one  or  more  logical  blocks;
          the  number  of  blocks  in a cluster is known as the volume
          cluster factor, or storage map cluster factor.

          A volume is identified as a  Files-11  volume  by  the  home
          block.   The home block is located at a defined physical lo-
          cation on the volume, and is identified by the  presence  of
          checksums  and  predictable  values.  The home block is des-
          cribed in detail in section 5.1.  To  identify  the  volume,
          the home block contains a volume label, which is a string of
          up to 12 ASCII characters.  The characters are restricted to
          the  printing  ASCII set (i.e., excluding control characters
          and rubout).  Further, it is recommended that volume  labels
          be  restricted to alphanumerics only to avoid conflicts with
          the command languages of  supporting  systems.   The  volume
          label of a volume may not be null.
 +-----------------------------------------------------------------------------+
 | Still not convinced?  Well, DEC's definition of a file involves a 64K limit |
 | on the record size, as indicated below. Larger than that, RMS doesn't like. |
 +-----------------------------------------------------------------------------+
          3.0  Files

          Any data in a volume or volume set that is of  any  interest
          (i.e., all blocks not available for allocation) is contained
          in a file.  A file is an  ordered  set  of  virtual  blocks,
          where  a  virtual block is an array of 512 8-bit bytes.  The
          virtual blocks of a file are consecutively numbered  from  1
          to  n,  where  n is the hightes numbered block that has been
          allocated to the file.  The number  assigned  to  a  virtual
          block  is  called  (obviously)  its virtual block number, or
          VBN.  Virtual blocks are mapped to unique logical blocks  in
          the volume set by Files-11.  Virtual blocks may be processed
          in the same manner as logical blocks.  Any  array  of  bytes
          less  than  65K  in  length may be read or written, provided
          that the transfer starts on a  virtual  block  boundary  and
          that its length is a multiple of four.

          For most files, all VBN's less than or equal to the  highest
          VBN allocated map to some LBN in the volume set.  Such files
          are said to be dense.  Files which are sparse contain virtu-
          al blocks which have not been allocated logical blocks.
 +-----------------------------------------------------------------------------+
 | Well, then maybe you can explain how a record length biffer than 65535 fits |
 | in a two-byte field?  Of course this doesn't apply to STREAM and UDF files. |
 +-----------------------------------------------------------------------------+
          7.2.3  F.RSIZ 2 Bytes - Record Size

                    In files containing fixed  length  format  records
                    this  word  contains  the  size  of the records in
                    bytes.  In Sequential files containing variable or
                    variable with fixed control formatted records this
                    field contains the size in bytes  of  the  longest
                    record  in  the file.  This field is undefined for
                    Relative and Indexed files containing variable  or
                    variable with fixed control format records.
 +-----------------------------------------------------------------------------+
 | As a matter of fact, with variable-record-length files, RMS is unhappy with |
 | anything over 327567 bytes long.  Such records are declared invalid by RMS. |
 +-----------------------------------------------------------------------------+
          7.2.9  F$MRS 2 Bytes - Maximum Record Size

                    This field contains a user specified  maximum  re-
                    cord size limit in bytes, to be enforced on output
                    operations.  Files containing Fixed length  format
                    records  have  F$MRS set equal to F$RSIZ.  For all
                    other record formats F$MRS  is  set  to  the  user
                    specified  value  given when the file was created.
                    A value of 0 is interpreted as no  maximum  record
                    size limit specified.

dp@JASPER.PALLADIAN.COM.UUCP (06/08/87)

    Date: 7 Jun 87 15:05:45 GMT
    From: unc!mcguffey@mcnc.org  (Michael McGuffey)

    In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU (*Hobbit*) writes:
    >Did I miss it back whenever, or is there really a bug in VAXCRTL that silently
    >limits the size of a read() to 65535 bytes?  I found this the hard way, while
    >-------

    I have recently been having trouble doing reads of around 1/2M at a time
    I got around it by reading in sections of 1K chunks.  Since
    I only needed the program once, I didn't fine tune itto see the
    largest read I could perform.  I like DEC equipment and software but it seems
    funny that they would claim in the VAX C ref manual that
    they want to maintain compatibility with unix c but
    have such a glaring bug.

    Does anybody have a patch or fix?

    --mike
    mcguffey@dopey.cs.unc.edu
    mcguffey@csnet

this is a holdover from pdp-11 days. The controllers could only transfer 64kb, so the
restriction carried  to  the  next  layer,  since  not all devices could be supported
uniformly. (of course  vms was  originally envisioned  as fitting  into 64kb, and the
original 11/780 memory  controller would  only support  1mb, with  a 2 controller per
system limit. It really was originally intended as a slightly bigger pdp-11, and  vms
was rsx-11a with some help from the hardware.)

I cannot tell you what the business programming language in use in the year 2000 will
look like, but it will be called COBOL. Programmers never bury their dead.

<dp>

jimp@cognos.uucp (Jim Patterson) (06/09/87)

In article <12308535119.47.AWALKER@RED.RUTGERS.EDU> AWalker@RED.RUTGERS.EDU.UUCP writes:
>Did I miss it back whenever, or is there really a bug in VAXCRTL that silently
>limits the size of a read() to 65535 bytes?

I don't think that it's fair to call this a bug; "limitation" is
more appropriate.  I'm sure there are UNIX implementations (e.g.
ones with 16-bit int's) that have comparable limits. Yes, it should
be documented in the manual somewhere (I'm not sure if it is or
not).  

In fact, 65535 is a limitation for RMS records, not just for C.
To overcome the limitation the C RTL would have to be prepared to
issue multiple RMS gets/puts for a single C read/write, which
could be done, but wouldn't be exactly equivalent.  I suppose the
implementors didn't think that anyone would think of reading
more than 65535 bytes with a single read.
-- 

Jim Patterson          decvax!utzoo!dciem!nrcaer!cognos!jimp
Cognos Incorporated

stevesu@copper.UUCP (06/10/87)

Inevitably, one quickly finds that many C RTL questions cross
over from language issues to operating system issues. Several
people have correctly pointed out that various parts of VMS have
inherent 65,535 or 32,767 byte record limitations. Unfortunately,
these arguments have nothing to do with C functions named read()
or write(). Most vendors, DEC included, provide a C run-time
library which "just happens" to look a lot like Unix. Presumably
this is to make porting programs from Unix to (in this case) VMS
easy. Therefore, the C RTL should do a reasonable amount of work
to hide filesystem or operating system peculiarities, especially
those not present in Unix.

Unix has _a_b_s_o_l_u_t_e_l_y _n_o _r_e_c_o_r_d _s_t_r_u_c_t_u_r_e. People who are used to
traditional record-based operating systems find this concept
about as alarming as Westerners do when encountering aborigines
running around without clothes, but Unix programmers love this
freedom, and having to deal with RMS is one of the biggest shocks
when moving from Unix to VMS.

VMS I/O is nominally device-independent; Unix I/O is much more so.
The read() and write() calls, even though they are low-level I/O
routines, should not unnecessarily reflect underlying device
characteristics, particularly on disk files.

It could be argued that a program that is trying to do reads of
more than 32,767 would be more portably written in terms of fopen
and fread, so that the stdio package could provide another level
of chunking/buffering. It could also be argued that such a
program is inherently unportable, because huge numbers like these
would not work as the third argument to read() on a 16-bit
machine. Dragging record length considerations in, however,
begins to compromise the semantics of read().

I must concede that, in the end, it is impossible to ignore
record-length considerations when doing C I/O on VMS. Even if
huge reads were supported, there are several other alignment
problems you have to worry about. (lseeks to record boundaries,
reads and writes of multiples of the record size, lines not
changing size when updating variable-length record files in
place, etc.) There are a whole bunch of these limitations, and
they could probably be better documented. In particular,
Table A-1 should not state that read() and write() have
"equivalent functionality." (Several of the "not equivalent"
entries in that table document restrictions much less significant
than those of read and write.)

Steve Summit
stevesu@copper.tek.com

P.S. Some of you are saying "but to remove the restrictions, the
read() and write() emulations would have to do extra buffering,
and that would be inefficient." Yes, buffering underneath read()
and write() is necessary if you want to remove as many RMS-
related restrictions as possible, but no, it doesn't have to be
inefficient. It is possible to write these routines so that,
when callers perform the "approved," aligned operations, no
buffering (and hence no significant overhead) is required.
When unaligned calls are made, the buffering is no less efficient
than the buffering that would inevitably be introduced in the
calling program to work around a restricted call.

P.P.S. I'm knowledgeable about C RTL issues, and could talk about
tradeoffs longer than you probably feel like listening and I feel
like typing, because I wrote one here at Tektronix, partly
because of licensing restrictions with the version 1 DEC C RTL,
and partly because we didn't feel like rewriting large numbers of
applications to work around the DEC C RTL limitations. Our
library is proprietary, of course, so I can't offer you a copy.

Disclaimers: Since I don't use DEC's C RTL, for the above-
mentioned reason, I can't be sure about its limitations and
restrictions. Many of them were removed between versions 1 and 2,
including perhaps some of the ones I mentioned in this article.

It is not strictly true that Unix has _n_o record structure;
when dealing with "raw" devices like terminals and tape drives,
the line/record structure becomes apparent, as indeed it must for
programs to work correctly when using these devices nonabstractly.
Disk files, however, are completely unstructured, unless you
count the st_blksize field in the stat structure in 4.2bsd and
Ultrix. (If Berkeley hates VMS as much as they would have us
believe, why do they keep putting so many VMSisms in 4bsd?)

Oh, and I hope this doesn't trigger another "Unix vs. VMS"
debate. I use 'em both; I'm not even gonna mention which one I
prefer :-).