[comp.lang.c] File size in bytes

scs@adam.pika.mit.edu (Steve Summit) (05/04/89)

In article <14301@bfmny0.UUCP> tneff@bfmny0.UUCP (Tom Neff) writes:
>MS-DOS has exact filesizes in bytes, and a standard OS call to retrieve
>a file's size in bytes.

So sorry; I was imprecise.  Both VMS and MS-DOS let you find out
how many bytes the operating system thinks the file contains.
The trouble is that this number is not equal to the number of
characters that you will read from the file if you do the usual C
text read with single \n's as line terminators.

VMS text files (well, the VMS file format normally used for text
files; VMS has many file formats) have no explicit line termination
(neither CR nor LF); however, attached to each line is a 16-bit
record length, stored in the file and counted against the total
file size.  MS-DOS uses the two-character sequence CR-LF as a
line terminator; any reasonable C run-time library translates
each CRLF to a single \n when "text mode" reads are performed.
In either case*, the relation

	size = chars + lines

holds, where size is the OS-reported size in bytes, chars is the
number of characters a text-mode C program would read, and lines
is the number of lines (the number of \n's read by a C program)**.

This discrepancy has implications for the tar file example I
mentioned, since tar format uses (and the file size in the tar
header must therefore reflect) single \n's as line terminators.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

* Actually, on VMS, the OS-reported size for variable-length,
  carriage control (i.e. standard text) files might be rounded up
  to the next multiple of the block size.  It's been a while
  since I used VMS.

** Modulo files without final newlines, which are rare on MS-DOS
   and impossible on VMS.