[comp.lang.c] VMS Specific question about binary reads using fgetc

bagpiper@mcosm.uucp (06/02/90)

I've been trying to use VMS GNU c v1.36 (and I am using the VMS c rtl libs..so
this probably isn't GNU specific).  I had a piece of code under MS-DOS 
which reads a file in binary mode (fopen("filename","rb")).  This piece of code
does not work under VMS because all of the cr's get cooked out of the file.
The files record attribute is "Carriage Return Carriage Control".  I am using
fgetc to read data out of the file.  Any hints, clues, ect.  I am fairly
new to VMS so I don't know all the insides of RMS.  Is there any way to do
this using portable c?  How about some RMS call? (oh, I am using VMS 5.1-1...
that shouldn't make much of a difference should it???)

				Thankx for any help,
					Michael
 
-------------------------------------------------------------------------------
+           Michael Hunter  {backbone}!hacgate!trwind!mcosm!bagpiper          +
+                                 BIX:bagpiper                                +
+               NOTHING like a spacecraft with a bad attitude!!!              +
-------------------------------------------------------------------------------

a204@mindlink.UUCP (Alexander Stockdale) (06/11/90)

Regarding reads from RMS files, I've played with this at various times.
It always works better if the the file characteristic is Stream-LF (the
UNIX format).  There are utilities which will convert files to this format for
you (check some of the DECUS tapes).  If you don't want to convert, you'll
probably have to modify the code so it knows about the implicit carriage return
information in the file.  This can be a real hassle, depending on what the code
is supposed to do.  For example, if
you're trying to strip CR/LF, no problem.  If, on the other hand, you're trying
to convert them to something else, this can be a real hassle.
--
------------------------------------------------------------------------
Alexander Stockdale  |   I'm not getting older -- I'm getting bitter.
Vancouver, BC, Canada|                      - me (as far as I know)

pauls@lion.inmos.co.uk (Paul Sidnell) (06/13/90)

>which reads a file in binary mode (fopen("filename","rb")).  This piece
of code
>does not work under VMS because all of the cr's get cooked out of the file.
>The files record attribute is "Carriage Return Carriage Control".  I am using
>fgetc to read data out of the file.  Any hints, clues, ect.  I am fairly

So you love the VAX too :-)

If your program had survived a little longer, it's dying comment may have been
   "I CAN'T FIND EOF EITHER" !

My understanding (arrived at with much pain and frustration) is that if a file
already exists, the mode that you open the file in is ignored
completely. If you
delete the file and THEN fopen("filename","rb"); then a 'STREAM-LF' file
will be
created and everything will be happy again.

Similarly you will find many 'departures' from ANSII C using ftell and fseek on
non 'STREAM-LF' files.

Generally, the I/O is at it's sanest ONLY on these types of files.

Please excuse any froth around my mouth while I discuss this subject.

-------------------------------------------------------------------------------
|                     Disclaimer:      IT'S ALL MY FAULT                      |
-------------------------------------------------------------------------------

daniels@hanoi.enet.dec.com (Bradford R. Daniels) (06/15/90)

The VAX C RTL tries to "do the right thing" with record files.
If the file has a carriage control attribute (e.g. carriage
return carriage control in this case), it will (by default)
append a newline to each record as it is read in.  Your fopen()
statement, however, specifies "rb" as the file open mode, or
"read only in binary mode".  This specification that the file
is binary overrides C's default carriage control interpretation,
and you get the data unadorned, as it were, with newlines.  This
is fine if you're truly reading a binary file which just happens
to be in variable record format, but in the case of the file in
question, the record format is significant, and the file is not,
in fact, simply binary data.

Hope this helps.

- Brad

-----------------------------------------------------------------
Brad Daniels			|  Digital Equipment Corp. almost
DEC Software Devo		|  definitely wouldn't approve of
"VAX C RTL Whipping Boy"	|  anything I say here...

daniels@hanoi.enet.dec.com (Bradford R. Daniels) (06/15/90)

In article <2081@mindlink.UUCP>, a204@mindlink.UUCP (Alexander
Stockdale) writes:
> Regarding reads from RMS files, I've played with this at various times.
> It always works better if the the file characteristic is Stream-LF (the
> UNIX format).  There are utilities which will convert files to this
format for
> you (check some of the DECUS tapes).  If you don't want to convert, you'll
> probably have to modify the code so it knows about the implicit
carriage return
> information in the file.  This can be a real hassle, depending on what
the code
> is supposed to do.  For example, if
> you're trying to strip CR/LF, no problem.  If, on the other hand,
you're trying
> to convert them to something else, this can be a real hassle.

The reason stream-LF works so well is that we access stream-lf files
in block mode by default (i.e., we have RMS do direct QIOs to the
file.)  Since we pay no attention to the record format of the file
in block mode, we can provide a much higher degree of Unix emulation,
particularly as regards file positioning.  Unfortunately, though,
block mode I/O is much slower than record mode I/O because it can-
not use such beneficial features as multibuffering, read-ahead, and
write-behind.

The important thing to remember about the VAX C/VMS RTL is that it
always tries to make the data it reads in look like a stream, even
if there is record structuring information in the file.  That means 
that things like record boundaries and carriage control have an effect
on the data you receive.  If you accept the default behavior, you will
usually get the right behavior.  If you randomly throw RMS options at
the open statement, specify binary data when you don't mean it, or
give a file carriage control attributes it shouldn't have, however,
you won't get the right behavior.

I'm not saying the carriage control attributes on a file are always
control, but in general, these are good rules of thumb:

	1.  If the file contains text, it should either be streamlf
	    or some form of variable record length file with an
	    appropriate carriage control attribute.

	2.  If the file contains binary data, it should either be
	    streamlf format (possibly with carriage return carriage
	    control,) or some other format with no carriage control.
	    The latter is preferable, since a legal stream-lf file
	    is supposed to end with a newline, and real binary files
	    rarely do.  Of course, if you never access the file in
	    record mode, it doesn't really matter.

	3.  Don't use "b" mode in fopen unless the file really is
	    just binary data.  You will usually lose any line
	    formatting information there may be in the file (e.g.
	    newlines if the file uses carriage return carriage
	    control).

I had some others in mind when I started, but I can't remember them
right now...  I'm sure a future question will bring them to mind,
though...

- Brad

-----------------------------------------------------------------
Brad Daniels			|  Digital Equipment Corp. almost
DEC Software Devo		|  definitely wouldn't approve of
"VAX C RTL Whipping Boy"	|  anything I say here...

daniels@hanoi.enet.dec.com (Bradford R. Daniels) (06/15/90)

In article <7459@ganymede.inmos.co.uk>, pauls@lion.inmos.co.uk (Paul
Sidnell) writes:
> My understanding (arrived at with much pain and frustration) is that
if a file
> already exists, the mode that you open the file in is ignored
> completely. If you
> delete the file and THEN fopen("filename","rb"); then a 'STREAM-LF' file
> will be
> created and everything will be happy again.

Huh? "rb" opens the file read only.  Of course the file will still have
whatever
attributes it had before it ws opened.

VAX C tries to balance Unix compatibility and VMS integration wherever
possible.  On Unix, opening an existing file whether for input or output
changes nothing about the file except the date and possibly its contents
(if you're truncating the file).  Other attributes (ownership, permissions,
etc.) remain unchanged.  The first implementors of the VAX C RTL decided
(quite reasonably, I feel) to extend that concept to the much larger set
of attributes files may have under VMS.  Thus, if you supercede an existing
file, it should (were it not for some bugs in certain code paths in the
RTL) have the same protection, format, carriage control, and whatever else
as the existing version of the file.

If you did not explicitly specify any file format or carriage control
options, the C RTL assumes you don't want to change anything.

> Similarly you will find many 'departures' from ANSII C using ftell and
fseek on
> non 'STREAM-LF' files.

Yeah, it kinda bugs me, too...  It actually would have been possible to get
better emulation for files smaller than 4MB if a different encoding had been
chosen from day 1, but the algorithm would have been completely broken for
larger files, and the values returned by ftell() would not have been simple
integer offsets from the beginning of the file.  As it is, the current
algorithm
is the best you can do given 32 bit integers and RMS' requirements.

> Generally, the I/O is at it's sanest ONLY on these types of files.

Actually, it's pretty sane on most file types if you know what it's supposed to
do.  The problem is a lack of documentation as to what it's supposed to do...

> Please excuse any froth around my mouth while I discuss this subject.

We've been looking over ways to improve the VAX C I/O system in a major way,
perhaps even rewriting the whole thing.  Constructive frothing is always
appreciated.

- Brad

-----------------------------------------------------------------
Brad Daniels			|  Digital Equipment Corp. almost
DEC Software Devo		|  definitely wouldn't approve of
"VAX C RTL Whipping Boy"	|  anything I say here...

martin@mwtech.UUCP (Martin Weitzel) (06/26/90)

In article <1951@cirrusl.UUCP> dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) writes:
[about 32-bit values beeing too small to represent file position offsets]
>A strategy that DEC could have used, but did not, would have been to
>make ftell() return a magic cookie.  The magic cookie would exactly
>encode record number and offset for small files.  For larger files, the
>magic cookie would be the index of a seek value in an internal table.
>This table would grow as needed.

The "magic cookie" approach is exactly what ANSI-C supports through
the (new) "fgetpos/fsetpos" functions. IMHO it could not be generally
applied to "ftell/fseek", because it is existing practice to do some
arithmetic with the return value (eg. to advance some bytes in either
direction).

This should also be taken as a guideline, *when* to use *which* of
the two similar functions for positioning in files.

1) Use "fgetpos/fsetpos" regardless of the file type (binary or
   text), if you only want "mark" some places for later "restore".

2) Use "ftell/fseek" only if you are sure that the following
   restrictions will be met:
   a) For a "text"-file, you only fseek to the start (offset 0
      with SEEK_SET), the current postion (offset 0 with SEEK_CUR,
      somewhat pointless but allowed) or to the end (offset 0
      with SEEK_END). 
      Otherwise you use *exactly* the value that is returned from
      "ftell" (no arithmetics!) as offset with SEEK_SET.
   b) For a "binary"-file you may "fseek" to any position, even
      one you calculate by doing some arithmetic with the return
      value of "ftell", but be sure that the maximum file size
      will fit into a long on the target hardware.

Note that 2a) opens the door for the "magic cookie"-approach,
but it can only be applied to "text"-files. Obviously, ANSI-C
misses to specify a *portable* method to seek around within
binary files with sizes that exceed the range of values for
a long. IMHO, this is not a serious drawback (though I'm sure
that there are some readers on the net who will exactly have
this requirement :-)). But consider that many operating systems
currently do not support files of this size, that even if they
do, files of this size are not very frequently found, and that
nobody forbids a C-implementation to support additional library
functions that adress the problem.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

peter@ficc.ferranti.com (Peter da Silva) (06/27/90)

In article <12865@shlump.nac.dec.com> daniels@hanoi.enet.dec.com (Bradford R. Daniels) writes:
> It's provided as an alternative to the normal
> mechanism, has slightly different semantics, and is a bit more
> efficient than the alternative.

The existing syntax, "#include <stdio.h>" was already perfectly well
suited for those semantics.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

djh@osc.edu (David Heisterberg) (06/28/90)

In article <.X94JD6@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
> The existing syntax, "#include <stdio.h>" was already perfectly well
> suited for those semantics.

Then use it.  It works just fine.  I don't understand what you're complaining
about.  VMS often has include-like files in text libraries, and the VAX C
extension to #include takes advantage of that.  If you don't want to use it,
then don't.  Who's putting a gun to your head?
-- 
David J. Heisterberg		djh@osc.edu		And you all know
The Ohio Supercomputer Center	djh@ohstpy.bitnet	security Is mortals'
Columbus, Ohio			ohstpy::djh		chiefest enemy.

peter@ficc.ferranti.com (Peter da Silva) (06/28/90)

The subject is: "#include stdio" in VAX C. I claim:
> The existing syntax, "#include <stdio.h>" was already perfectly well
> suited for those semantics.

In article <686@illini.osc.edu> djh@osc.edu (David Heisterberg) writes:
> VMS often has include-like files in text libraries, and the VAX C
> extension to #include takes advantage of that.

They didn't need to create a new syntax for it. What I'm saying is that they
should have made "#include <stdio.h>" extract those files from those text
libraries, thus making the new syntax redundant.

> If you don't want to use it,
> then don't.  Who's putting a gun to your head?

The people writing the code that I had to support.

I don't know about you, but I don't have the advantage of programming in a
vacuum.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.
<peter@ficc.ferranti.com>

henry@zoo.toronto.edu (Henry Spencer) (06/29/90)

In article <686@illini.osc.edu> djh@osc.edu (David Heisterberg) writes:
>> The existing syntax, "#include <stdio.h>" was already perfectly well
>> suited for those semantics.
>
>Then use it.  It works just fine.  I don't understand what you're complaining
>about.  VMS often has include-like files in text libraries, and the VAX C
>extension to #include takes advantage of that.  If you don't want to use it,
>then don't.  Who's putting a gun to your head?

Does the word "portability" mean anything to you?  How about "many wasted
man-months trying to port software written by clever people who think it's
cute to use VMS-specific language features"?  We don't want *you* to use
it either, because someday we may have to port or maintain your code.
-- 
"Either NFS must be scrapped or NFS    | Henry Spencer at U of Toronto Zoology
must be changed."      -John Osterhout |  henry@zoo.toronto.edu   utzoo!henry

karl@haddock.ima.isc.com (Karl Heuer) (06/29/90)

In article <686@illini.osc.edu> djh@osc.edu (David Heisterberg) writes:
>In article <.X94JD6@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter da Silva) writes:
>> The existing syntax, "#include <stdio.h>" was already perfectly well
>> suited for those semantics.
>
>Then use it.  It works just fine.  I don't understand what you're complaining
>about.  VMS often has include-like files in text libraries, and the VAX C
>extension to #include takes advantage of that.

The point is, there was no need to invent a syntax extension.  DEC should have
simply asserted that `#include <stdio.h>' searches a text library in addition
to a directory.  That way, we all get the benefit of the speed advantage, and
DEC wouldn't now be in the embarrassing position of having a feature that is
in direct conflict with the ANSI C `#include MACRONAME' feature.

(It was a botch to use angle brackets as well as quotes in the first place,
but it's way too late to correct that.)

Karl W. Z. Heuer (karl@kelp.ima.isc.com or ima!kelp!karl), The Walking Lint