scs@adam.pika.mit.edu (Steve Summit) (05/04/89)
In article <14301@bfmny0.UUCP> tneff@bfmny0.UUCP (Tom Neff) writes: >MS-DOS has exact filesizes in bytes, and a standard OS call to retrieve >a file's size in bytes. So sorry; I was imprecise. Both VMS and MS-DOS let you find out how many bytes the operating system thinks the file contains. The trouble is that this number is not equal to the number of characters that you will read from the file if you do the usual C text read with single \n's as line terminators. VMS text files (well, the VMS file format normally used for text files; VMS has many file formats) have no explicit line termination (neither CR nor LF); however, attached to each line is a 16-bit record length, stored in the file and counted against the total file size. MS-DOS uses the two-character sequence CR-LF as a line terminator; any reasonable C run-time library translates each CRLF to a single \n when "text mode" reads are performed. In either case*, the relation size = chars + lines holds, where size is the OS-reported size in bytes, chars is the number of characters a text-mode C program would read, and lines is the number of lines (the number of \n's read by a C program)**. This discrepancy has implications for the tar file example I mentioned, since tar format uses (and the file size in the tar header must therefore reflect) single \n's as line terminators. Steve Summit scs@adam.pika.mit.edu * Actually, on VMS, the OS-reported size for variable-length, carriage control (i.e. standard text) files might be rounded up to the next multiple of the block size. It's been a while since I used VMS. ** Modulo files without final newlines, which are rare on MS-DOS and impossible on VMS.