[comp.os.vms] Records; VMS vs. UNIX file system

chris@mimsy.UUCP (Chris Torek) (10/01/88)

>>In article <1127@fredonia.UUCP> mazumdar@fredonia.UUCP (Jin Mazumdar) writes:
>>>Although UNIX does not have fixed length records...

>In article <4136@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) answers:
>>It certainly does.  Look at the structure of /etc/utmp and /usr/adm/wtmp
>>or equivalent files on your system.

In article <7296@rpp386.Dallas.TX.US> jfh@rpp386.Dallas.TX.US
(The Beach Bum) replies:
>not in the typical sense.  there is no file-system level support for
>fixed length records.

As others have pointed out, when you get right down to the `file system',
the same is true of VMS.  The odious (oops :-) ), er, ODS II file system
is actually a series of (512 byte?) blocks.  In a sense, the records are
`all in your head'.

The true underlying difference in the *file* *system* is this:  ODS II
files are a file header plus a lump of blocks, while Unix files are a
file header (`inode') plus a lump of bytes.  This distinction is
surprisingly important:  By making the interesting part of a file `a
lump of bytes', Unix can represent as files odd monsters like
terminals, pipes, and inter-machine IPC streams.  ODS II cannot,
because terminals, pipes, and IPC streams are not lumps of blocks.  In
particular, input from, or output to, these monsters comes in varying
and often unpredictable sizes.

Of course, these funny file monsters are not *exactly* like files,
and even on Unix programs can tell them apart.  (In particular, the
inode has a type field, and there are special-purpose operations,
such as setting terminal characteristics, that have no function
that can be applied to files.)  The key is that applications
are not *required* to tell them apart, and in general, well-written
applications make no attempt to do so, and thus work just as well
with these odd `files' as with real files.

Moving up a level (to RMS on VMS and to stdio and other libraries on
Unix), the systems diverge further.  RMS manages record information in
the `lump of blocks' files; if you use RMS routines, they will deal
with the conversion from one kind of record to another as appropriate
(apparently, usually by outlawing it entirely, an approach I find
rather dubious).  The records are `in RMS's head', so to speak.  Unix
maintains much of its distance: stdio provides no automatic
conversions, makes nothing illegal, and allows programs to have records
in *their* heads (via, e.g., fread and fwrite, or newline as a
delimiter), but also provides what might be called `variable length
newline delimited records': lines of text, collected by gets and fgets
and printed by puts and fputs, passed around using C's `string' style
(which further disallows ascii NUL).  Some application programs believe
in the `lines of text' model, and misbehave when fed arbitrary binary
files.  Other applications have no record models and work equally well
on any kind of file.  Unix libraries other than stdio are far less
universal, so there is not much to be said about them other than that
they exist, and they may or may not provide record models, keyed
access, and so forth.

Now, records are not inherently `wrong' or `evil', and there are
applications in which specific formats make sense.  The Unix file
system does not prevent writing such applications, but neither does it
provide much assistance.  The essentials are there: random access via
lseek, and reading and writing via read and write.  It is up to the
application (or a vendor's library) to use these in a clean and
efficient manner.  Stdio's fread and fwrite allow fixed-length records
in a relatively clean and efficient manner, but does not provide
counted records or delimited records.  VMS's RMS *does* provide the
assistance, although your model must fit one of its models.  If your
model differs sufficiently---if your `records in your head' are not
like any of RMS's `records in its head', you have to `tinker with RMS's
head', or else bypass RMS entirely.  Of course, RMS provides all the
conventional formats; it only gets in the way if you intend to be
nonconventional.  Unix never gets in the way, but rarely makes it easy.

There is a `flip side' to this, though.  Since Unix provides few record
models (the only one commonly available is the newline-delimited text
line), programmers are not tempted to invent new formats that other
applications cannot deal with.  The classic example is RMS `print file
format'.  It sounds reasonable enough:  A print file is intended to be
printed, and the program to print files can make sure that the file is
a print file.  Alas, when it comes time to make a quick tweak, one
discovers that the editor (EDT) cannot edit print files.  The report
must be out in ten minutes, but you will have to go back and change the
original and re-run it, and of course that takes 30 minutes....
(Certainly the above happens rarely; if the programmers using the
system maintain some self-discipline, they will not go off and invent
new file formats when an existing one does the job.  Someone should
have told that to the author of RUNOFF:  Its output should be a *text*
file.)

In summary: the major difference is that Unix `records' are in the eye
of the beholder, and not (as in VMS) supplied as any part of the
system.  They are there when you truly need them; they are not there
when you do not want them.  In VMS you must bypass or fool RMS if you
do not want them.  (Apparently fooling RMS is easy, and might better
be called `asking RMS nicely'.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

sommar@enea.se (Erland Sommarskog) (10/02/88)

Chris Torek (chris@mimsy.UUCP) writes:
>The classic example is RMS `print file
>format'.  It sounds reasonable enough:  A print file is intended to be
>printed, and the program to print files can make sure that the file is
>a print file.  Alas, when it comes time to make a quick tweak, one
>discovers that the editor (EDT) cannot edit print files.  
>...
>Someone should
>have told that to the author of RUNOFF:  Its output should be a *text*
>file.)

I just tried RUNOFF. If we forget these CR-LF at the end of each line, 
it gave a perfectly normal text file. I guess they have modified RUNOFF
since Chris played with VMS.
  But there are other facilities that use weird formats. The report 
generator in VAX-Cobol produces VFC files (I think it is.) The editor
(TPU these days, *not* EDT) doesn't mind it, but I haven't tried to
write the file back to disk, which I suspect would result in a "file is
converted to a supported format".
-- 
Erland Sommarskog            
ENEA Data, Stockholm         
sommar@enea.UUCP

gwyn@smoke.ARPA (Doug Gwyn ) (10/03/88)

In article <13800@mimsy.UUCP> chris@mimsy.UUCP (Chris Torek) writes:
>In summary: the major difference is that Unix `records' are in the eye
>of the beholder, and not (as in VMS) supplied as any part of the
>system.

Chris's summary was pretty good, but in case anyone wasn't aware, he
was describing *disk files*.  Some of the more general notions of
"file" in UNIX really do have records, magtape and terminal input
being the obvious ones.  Of course the record structure is forced by
the nature of these devices, so it isn't a design botch, but it is
something one should be aware of.  I recall not long ago finding out
much to my annoyance that at least one version of "cat" was losing
record size information due to having been converted to stdio instead
of read/write.  And yes, it DID matter.