[comp.os.vms] VMS vs. UNIX file system

samperi@marob.MASA.COM (Dominick Samperi) (09/13/88)

Can people who have had experience working with both VMS files (at the
FDL level) and UNIX files (at the inode level, say) comment on the
advantages and disadvantages of the file systems used by these operating
systems? My experience is mostly with the UNIX file system, so I was a
little surprised when I discovered recently that VMS text files, object
code files, and executable files all have different record structures.
What does the added complexity of having to deal with RMS, FDL, CONVERT,
etc., buy?
-- 
Dominick Samperi, NYC
    samperi@acf8.NYU.EDU	samperi@marob.MASA.COM
    cmcl2!phri!marob        	uunet!hombre!samperi
      (^ ell)

dave@arnold.UUCP (Dave Arnold) (09/13/88)

samperi@marob.MASA.COM (Dominick Samperi) writes:
> [...] stuff deleted
> ...comment on the
> advantages and disadvantages of the file systems used by these operating
> systems? My experience is mostly with the UNIX file system, so I was a
> little surprised when I discovered recently that VMS text files, object
> code files, and executable files all have different record structures.
> What does the added complexity of having to deal with RMS, FDL, CONVERT,
> etc., buy?

The VMS file system doesn't buy you anything, unless your application
requires ISAM---However, how often do you need ISAM?

I think the VMS filesystem is overly complicated, and one of the major
downfalls of VMS (but can be tolerated).  If the original DEC designers
had it to do over again, I suspect they would have stuck with a
Stream-only based filesystem (Like UNIX), and provided ISAM libraries.
The FORTRAN record format, FIXED SIZE RECORDS, VARIABLE LENGTH,
CARRAIGE RETURN CARRIAGE CONTROL... Oh, don't forget the VFC record
format...  These are all completely archaic, and date the VMS
operating system.

I feel very strongly about this.  Anyone disagree?

VMS's stengths?

AST's, Timer queues, condition handling, exit handling, message
facility.

In regards to the above, VMS was way ahead of it's time circa 1978,
and life would be difficult without the above.

Other VMS pitfalls?

The resource quota system!!!!!!!

How often have you written a program, and got the famous:

%SYSTEM-F-EXCEEDED QUOTA

message?  Isn't it fun trying to figure out which bloody quota
was exceeded?!  Stupid!
-- 
Dave Arnold
dave@arnold.UUCP	{cci632|uunet}!ccicpg!arnold!dave

bzs@encore.UUCP (Barry Shein) (09/13/88)

>What does the added complexity of having to deal with RMS, FDL, CONVERT,
>etc., buy?
>-- 
>Dominick Samperi, NYC

There are plusses and minuses in both approaches. The intention of
formalizing a bunch of file access methods is to put the code
whereever the vendor (designer) believes it will do the most good. For
example, by knowing you have promised to access some file only
sequentially it can be stored in a manner optimal for that usage.
Similarly, an indexed file can have its read methods set up, perhaps
maintaining two separate cache's (indices,data), for optimal access.

It also means that you go through some standard set of routines with a
standard set of assumptions (eg. I can open an ISAM file, knowing a
few things about it, w/o asking for details about how it's stored, if
one builds their own ISAM file into a bag-of-bytes file it may not be
at all obvious how to read it without access to the original program
which wrote it.)

The downside is that these access methods tend to get used.

What I mean is, used unnecessarily where bag-of-bytes files would do
just fine and cause much less confusion.

For example, on an earlier release (probably 1.6) of VMS I wanted to
edit a file produced by RUNOFF (to do a few global changes so
underlining or some such would print properly on my printer.) Not as
easy as it sounded, EDT refused to load this print file for editing,
complained about an illegal file type.

One could point the finger at EDT and say it was deficient in not
handling enough file formats, I tend to think that barring super-human
effort it was inherent in the design environment, it would be hard to
properly edit every file type that was allowed (last I checked CONVERT
still couldn't convert some reasonable-looking conversions.) I believe
TECO did the job fine, but I was pretty shocked at not being able to
edit this fairly plain looking text file.

It wasn't the *data* which was preventing loading this into EDT (as
with, say, trying to load an a.out into VI which wouldn't work too
well either, but for a different reason), it was merely a bit
somewhere identifying this as a print file or some such nonesense and
thus EDT kicking it out without trying. Such problems were ubiquitous
(at least it always seemed like someone was coming to me trying to
work around a similar problem, utilities wouldn't cooperate.)

Under IBM systems with a similar record oriented philosophy I remember
real panic if we couldn't find the original parameters under which a
file was created. It basically couldn't be opened anymore unless you
could produce the right magic numbers it was created with (blocking
factors etc.) I'm sure some wizardly types could have solved that
directly but it sure wasn't obvious to us, other than guessing numbers
and paying real money to watch perhaps dozens of tries go down the
drain and feeling kind of foolish and seriously out of control.

The problem with the Unix "unstructured" approach is that either you
use some of the (very few) library routines (dbm is a major one, so
are the object deck readers in SYSV) or you roll your own, each
application will have its own way of storing data (compare termcap
with passwd with inittab with crontab with ...) often not terribly
well documented or efficient (agreed, often efficiency is a poor
excuse for obscurity.)

It's all a balancing act. In my ideal world there would be a variety
of standardized access methods and you would avoid using them like the
plague, especially in general system utilities, simple byte-stream
files should account for most input and output (a la Unix), but for
those occasional, carefully justified problems, access methods could
be resorted to. Also, the operating system would know as little about
them as possible (eg. opening any file as a byte-stream would do
something reasonable, *never* return an error.)

	-Barry Shein, ||Encore||

dave@arnold.UUCP (Dave Arnold) (09/14/88)

In article <3597@encore.UUCP>, bzs@encore.UUCP (Barry Shein) writes:
> 
> What I mean is, used unnecessarily where bag-of-bytes files would do
> just fine and cause much less confusion.

Exactly.

> For example, on an earlier release (probably 1.6) of VMS I wanted to
> edit a file produced by RUNOFF (to do a few global changes so
> underlining or some such would print properly on my printer.) Not as
> easy as it sounded, EDT refused to load this print file for editing,

EDT still gives a warning about files created with VAXC.  Dumb!

> The problem with the Unix "unstructured" approach is that either you
> use some of the (very few) library routines (dbm is a major one, so
> are the object deck readers in SYSV) or you roll your own, each
> application will have its own way of storing data (compare termcap
> with passwd with inittab with crontab with ...) often not terribly
> well documented or efficient (agreed, often efficiency is a poor
> excuse for obscurity.)

This is not a problem.  It's not often that your application requires
you to "Roll your own".  And you get a very simple filesystem.
When you try to design a filesystem that will attempt to please
everyone under all circumstances, you over build---A real mess.

Anyone try tuning a RMS ISAM file?  Some pretty spiffy analysis
tools :-,

> It's all a balancing act.

Tightrope.

> 	-Barry Shein, ||Encore||

I appreciate your points, Barry, but don't agree.
-- 
Dave Arnold
dave@arnold.UUCP	{cci632|uunet}!ccicpg!arnold!dave

jbw@bucsb.UUCP (Joe Wells) (09/15/88)

In article <178@arnold.UUCP> dave@arnold.UUCP (Dave Arnold) writes:
>VMS's strengths?
>AST's, Timer queues, condition handling, exit handling, message
 ^^^^^                ^^^^^^^^^^^^^^^^^^
>facility.

ASTs are for me VMS's greatest advantage.  The ability to have
multiple system calls outstanding at one time is a godsend for
realtime control systems.  Running a separate process for each
blocking task and using IPC just doesn't cut it.

Stack unwinding is really nice too.  The ability in LISP to abort
instruction sequences with "throw" and have everything clean itself up
as the stack is unwound is *very* powerful.  In addition, you can post
your own unwinding cleanup instructions.  Under VMS, you can do this
in *any* language.  The operating system and the VMS procedure
calling standard provide for generic stack unwinding.

Directory links under VMS are not necessary for a file to exist.
Under UNIX, when all the links to a file disappear, and all processes
close the file, the file is deleted.  In VMS, a file can exist without
a name.  It can be accessed by its unique file identifier.  In
addition, the problem of dangling directory links to deleted files
does not exist.  When the VMS equivalent of the UNIX inode is reused,
a counter in the index table slot is incremented.  Thus any dangling
pointers to the previous file that used the same slot won't have any
effect.

I would be much happier with the UNIX environment if it supported
these features, but then if money grew on trees, I probably couldn't
climb them.  I'm also not trying to imply that VMS doesn't have more
than its own share of ridiculously stupid features.

>Other VMS pitfalls?
>The resource quota system!!!!!!!
>How often have you written a program, and got the famous:
>%SYSTEM-F-EXCEEDED QUOTA
>message?  Isn't it fun trying to figure out which bloody quota
>was exceeded?!  Stupid!

Good lord!  Don't remind me of this!  What a royal pain in the *ss!

>-- 
>Dave Arnold
>dave@arnold.UUCP	{cci632|uunet}!ccicpg!arnold!dave

Joe Wells
UUCP: ...!harvard!bu-cs!bucsf!jbw
INTERNET: jbw@bucsf.bu.edu

bzs@encore.UUCP (Barry Shein) (09/16/88)

Last things first...

>> It's all a balancing act.
>
>Tightrope.
>
>> 	-Barry Shein, ||Encore||
>
>I appreciate your points, Barry, but don't agree.
>-- 
>Dave Arnold

Not sure what you don't agree with, I assume it's the following:

>> The problem with the Unix "unstructured" approach is that either you
>> use some of the (very few) library routines (dbm is a major one, so
>> are the object deck readers in SYSV) or you roll your own, each
>> application will have its own way of storing data (compare termcap
>> with passwd with inittab with crontab with ...) often not terribly
>> well documented or efficient (agreed, often efficiency is a poor
>> excuse for obscurity.)
>
>This is not a problem.  It's not often that your application requires
>you to "Roll your own".  And you get a very simple filesystem.
>When you try to design a filesystem that will attempt to please
>everyone under all circumstances, you over build---A real mess.

It's a problem if you have the problem.

"It's not often" might be true in your world, I doubt you could
convince the people I know trying to store their library catalogues
(eg) that efficient keyed storage and lookup is an uncommon problem.
Or business types trying to keep payroll or customer lists etc.

I agree it's hard to design a general filing system which pleases
everyone.  I'm not sure it's a law of nature that one cannot. In fact,
Unix might be quite close, just missing some application level
standards in regards to file storage libraries (from which, perhaps,
interested people could investigate tuning the system a little, the
buffer cache probably does most of what they want anyhow.)

	-Barry Shein, ||Encore||

dave@arnold.UUCP (Dave Arnold) (09/17/88)

In article <3613@encore.UUCP>, bzs@encore.UUCP (Barry Shein) writes:
> 
> Last things first...
> 
> >> It's all a balancing act.
> >
> >Tightrope.

I should have added a bunch of :-) to my original followup.  It seems
there is a bitter feeling towards my posting.  I didn't intend to cause
such feelings.  I will be more careful in the future.

My appologies.

Signed,

Dave (egg on my face) Arnold
-- 
Dave Arnold
dave@arnold.UUCP	{cci632|uunet}!ccicpg!arnold!dave

jeh@crash.cts.com (Jamie Hanrahan) (09/18/88)

In article <3597@encore.UUCP> bzs@encore.UUCP (Barry Shein) writes:
	[much good stuff...]
>
>It's all a balancing act. In my ideal world there would be a variety
>of standardized access methods and you would avoid using them like the
>plague, especially in general system utilities, simple byte-stream
>files should account for most input and output (a la Unix), ...

I disagree.  I much prefer VMS's variable-length-record text file format
to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
legitimate data as a record separator.  To make matters worse, the standard
C method for dealing with strings uses a *different* character as a string
terminator!  Unix has a lot of GREAT ideas in it, but this isn't one of them.

Barry goes on to say that you should be able to open any file as a byte
stream and not get an error.  Well, you can do the equivalent under VMS--
you can open any file, sequential, relative, or indexed, for sequential
access, and RMS will happily hand you the records in order (in order by
primary key if it's an indexed file).  And if you prefer a byte-stream
rather than a record-oriented interface (and, yes, the byte-stream i/f
has GREAT advantages from a program style standpoint; non-believers,
particularly those who have never looked inside Unix utilities, should
take a look at Kernighan and Plauger's _Software Tools_ or _Software
Tools in Pascal_ to see what I mean), you or the system can provide a
set of byte-stream interface routines to do that with a record-oriented
file system.  (That DEC's VAX C RTL does this, shall we say, imperfectly,
is a problem in the implementation, not the concept.)

(Incidently, Barry's problem with EDT stems from Runoff's former use of
print-format files, wherein carriage control information for each record
is stored in a fixed-length field preceding the text information.  A 
program that expects to read an ordinary text file can read such a file,
but it won't see the fixed-length field, so if it's an editor it can't
reconstruct the field on output.  The print-format file is one use of
"Variable with Fixed Control" record format, and I'm very happy to report
that very few VMS programs generate such files these days; it's one record
format that VMS could have done without.  )

To give you an idea of the generality of VAX RMS, the system runs happily
using just a few of the available file formats.  Text is stored in variable-
length-record, sequential files.  So is object code (possible even though
you can have null bytes, line feeds, etc., etc., in object records... because
RMS doesn't use in-band data for record terminators!).  Images and
library files go in fixed-length-record files, essentially with their own
internal format implemented by the programs that deal with them.  There
are a few indexed files like the user authorization file.  And that's about
it.  

For me, the bottom line is that it works, that RMS with all its fabled 
"inefficiencies" runs rings around most folks who try to bypass it (whoa,
now!  I said "most".  This because most people don't do the good job with
read-ahead and write-behind buffering that RMS does.  Sure, if you do that,
AND implement the record handling yourself, you can beat RMS, barely.  My
point is that you don't have to bother to get good performance), and that
I've dealt with VMS's file system for years without feeling I was doing 
battle with it.  No doubt if I was moved to a Unix environment I would 
gripe a lot for a few weeks about "those stupid byte-stream files", but 
I'd like to think that I'd adapt and figure out how to do things the Unix way
and work with the system instead of fighting it.  I'd like to think that most
Unix folks who come to VMS would do the converse.  I'm probably wrong on
both counts... :-)

jeh@crash.cts.com (Jamie Hanrahan) (09/18/88)

In article <178@arnold.UUCP> dave@arnold.UUCP (Dave Arnold) writes:
>How often have you written a program, and got the famous:
>
>%SYSTEM-F-EXCEEDED QUOTA
>
>message?  Isn't it fun trying to figure out which bloody quota
>was exceeded?!  Stupid!

Apologies for the cross-followup to the unix group; I don't know if
Dave reads the VMS group.  

The VMS quota system has two good reasons for being.  First, it prevents
a runaway program from using up all of something that might be in short
supply, like nonpaged pool or process slots.  Without this you could sit
in a loop doing $QIO with no wait to an offline device, and you'd bother
everybody on the system.  With quotas in effect you only bother yourself.
You can always enable process resource wait mode, which will cause your
process to go into MWAIT state (usually seen, for this purpose, as RWAST)
until the needed quota is returned, presumably by the completion of a 
previously-requested operation.  (Process resource wait mode is enabled
by default.)

You can also get EXQUOTA if you try to do a buffered I/O operation that's
larger than the size permitted by the SYSGEN parameter MAXBUF.  This is a
common pitfall.  MAXBUF is only middling-sized by default (somewhere near
1K if I recall correctly).  Many sites routinely set this up to 8K or so,
especially those that have megabytes of pool available.  

The other purpose of the quota system is to make sure that everything 
you've started is finished before your image is allowed to run down.  
Say you start a direct I/O operation to a flaky device driver; the system
charges your DIOLM by one.  You wait and decide to ^Y out, but the driver's
cancel I/O code doesn't work right so the I/O doesn't get aborted.  The
system notes that the original DIOLM is different from the current value
and won't permit your image to run down until they're the same.  This is
one of the great banes of both system managers and driver writers, but it's
necessary much of the time; if that I/O op is a read, and it decides to 
complete AFTER your image has run down and the physical pages you used 
to own (and to which the DMA will be performed) get assigned to somebody
else, watch out!  It's impossible for the system to distinguish where this
is necessary and where it isn't, so it's done all of the time.  

jeh@crash.cts.com (Jamie Hanrahan) (09/18/88)

In article <178@arnold.UUCP> dave@arnold.UUCP (Dave Arnold) writes:
>The VMS file system doesn't buy you anything, unless your application
>requires ISAM---However, how often do you need ISAM?
>
>I think the VMS filesystem is overly complicated, and one of the major
>downfalls of VMS (but can be tolerated).  If the original DEC designers
>had it to do over again, I suspect they would have stuck with a
>Stream-only based filesystem (Like UNIX), and provided ISAM libraries.
>The FORTRAN record format, FIXED SIZE RECORDS, VARIABLE LENGTH,
>CARRAIGE RETURN CARRIAGE CONTROL... Oh, don't forget the VFC record
>format...  These are all completely archaic, and date the VMS
>operating system.

I strongly disagree.  I answered this in another note, but there are a
few other points here... 

How often do you need ISAM?  Well, if you have to implement it yourself,
probably you'll do without.  But if it's there it gets used, for good and
sufficient reasons.  There are MANY great applications for indexed files...
Netnews, for instance.  Some folks at BYU did a netnews workalike for VMS,
relying heavily upon indexed files to keep track of the newsgroup contents,
but storing the articles in individual files just as Unix netnews does.  It's
a VERY clean design, and they can process a batch of received news MUCH 
faster than Unix can running on the same hardware.  (To be as exact as dim
memory allows, I think they said ten times faster or so, and that the Unix
folks at the site were both amazed and jealous.)

Someone will likely complain that "all that RMS code" costs a lot in terms
of efficiency.  I offer this challenge:  Take a simple Unix filter like
DETAB running on some Unix system on a VAX (Ultrix, BSD, AT&T, whatever).
Rewrite it to use record-oriented I/O under VMS.  Boot VMS on the same 
hardware (or the equivalent).  We've done this and the VMS/RMS versions
run *at least* twice as fast, sometimes five or six times.  (The much greater
improvement in BYU News comes from a redesign to take advantage of indexed
files, not just conversion from stream- to record-oriented I/O.)

I know, I know -- for many applications stream I/O makes for much cleaner
program design.  But for others, it doesn't, at least not when you have
good alternatives available.  

I don't think that fixed vs.variable length records, implied
carriage control, etc., are archaic at all.  Variable with fixed control,
on the other hand, is right down there with punched cards and paper tape!

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/18/88)

In article <3438@crash.cts.com> jeh@crash.CTS.COM (Jamie Hanrahan) writes:
>I much prefer VMS's variable-length-record text file format
>to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
>legitimate data as a record separator.

UNIX files have no records, so there is no record separator.

But if you consider lines of text to be records and the newline
character to be a record separator (the concept is in your mind, not in
the filesystem), then VMS has a similar problem:  The low-level I/O
routines use perfectly legitimate data for administrative information!
Only at the RMS level is the overhead data made out-of-band.  And even
under UNIX, it is perfectly possible for an ISAM library to maintain
out-of-band administrative data.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

chris@mimsy.UUCP (Chris Torek) (09/18/88)

In article <3438@crash.cts.com> jeh@crash.cts.com (Jamie Hanrahan) writes:
>... the Unix byte stream uses perfectly legitimate data as a record
>separator.

Do you know what a `byte stream' is?  Byte streams do not have records;
they can hardly have record separators.  If you want records in a Unix
file system file, you must define them yourself.  This is what Barry
Shein was talking about.

>Barry goes on to say that you should be able to open any file as a byte
>stream and not get an error.  Well, you can do the equivalent under VMS--
>you can open any file, sequential, relative, or indexed, for sequential
>access, and RMS will happily hand you the records in order (in order by
>primary key if it's an indexed file).  And if you prefer a byte-stream
>... you or the system can provide a set of byte-stream interface routines
>to do that with a record-oriented file system.

Simulating a byte stream on top of records is considerably more
difficult than simulating records on top of a byte stream.  I have been
lead to believe that, under VMS, each different kind of record-oriented
file must be read with a different primitive.  (You must also provide a
buffer that is as large as the largest record.)  Hence to simulate a
byte stream, you must know about every possible record format.

On the other hand, to simulate a record format, you must know about
every possible byte stream.  Fortunately, there is only one possible
byte stream, by the definition of `byte stream'. . . .
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

chris@mimsy.UUCP (Chris Torek) (09/18/88)

In article <3442@crash.cts.com> jeh@crash.cts.com (Jamie Hanrahan) writes:
>How often do you need ISAM?  Well, if you have to implement it yourself,
>probably you'll do without.  But if it's there it gets used, for good and
>sufficient reasons.

This was Barry Shein's point.  But he might, and I will, go a bit
further:  sometimes it also gets used for bad, insufficient reasons.
(That does not mean it should not be there; but maybe it should not
be *too* easy to use.)

>... Some folks at BYU did a netnews workalike for VMS [using indexed
>files] ....  It's a VERY clean design, and they can process a batch of
>received news MUCH faster than Unix can running on the same hardware.
>(To be as exact as dim memory allows, I think they said ten times
>faster or so, and that the Unix folks at the site were both amazed
>and jealous.)

You are comparing incomparable things here.  The reason their news
unbacher is that much faster than the one in B news is almost certainly
because `it's a very clean design' and not because it uses any
particular storage format.  The B news unbatcher is a model of
inefficiency, clumsy patches, and re-re-re-re-worked code.  For
instance, an uncompressed batch file is read by forking a separate
process for each article in the file.  B news's only saving grace is
that it works, and it works on everything from PDP-11s to Convexes.

(Henry Spencer and Geoff Collyer rewrote the B news software and got
a similar order of magnitude performance increase, without changing
the file formats at all.)

>...  I offer this challenge:

Oh dear.

>Take a simple Unix filter like DETAB running on some Unix system on a
>VAX (Ultrix, BSD, AT&T, whatever).  Rewrite it to use record-oriented
>I/O under VMS.  Boot VMS on the same hardware (or the equivalent).
>We've done this and the VMS/RMS versions run *at least* twice as fast,
>sometimes five or six times.

If you pick your benchmarks carefully, you can prove anything.  Many
real programs spend a fair bit of time doing I/O, and VMS RMS I/O is
indeed quite efficient when properly used.  But so is Unix I/O.  VMS
currently has an implementation edge if the application reads large
blocks, since it does this by playing games with the MMU.  On the other
hand, Mach can do the same trick.

>I don't think that fixed vs.variable length records, implied
>carriage control, etc., are archaic at all.

I like the way Ken Thompson put it:

    These concepts fill a much-needed gap in other operating systems.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

bzs@encore.UUCP (Barry Shein) (09/18/88)

>I should have added a bunch of :-) to my original followup.  It seems
>there is a bitter feeling towards my posting.  I didn't intend to cause
>such feelings.  I will be more careful in the future.
>
>My appologies.
>
>Signed,
>
>Dave (egg on my face) Arnold

No bitter feeling or any such thing, I was just trying to draw out
exactly what in my note you were objecting to, I might have been
wrong, as inconceivable as that may be. Such disagreements are
oftentimes explained by nothing more (or less) than differing
perceptions of priorities, in this case the importance/frequency of a
need for efficient keyed (&c) storage access methods.

But do wipe the egg of your face, it's just left over from breakfast
and is irrelevant to the conversation, it's making us sick Dave :-)

	-Barry Shein, ||Encore||

bzs@encore.UUCP (Barry Shein) (09/19/88)

In the first place, it's not obviously an either/or situation. I
suspect that VMS's RMS could be implemented on top of Unix with little
or no change to the O/S (although performance tuning would have to
trade off asynch read-ahead/write-behind and Unix's buffer cache which
accomplishes much the same basic thing [ie. the block you want next is
highly likely to be off the disk and in memory by the time you need
it], albeit in a different manner with different considerations.)

I wouldn't be at all shocked to see DEC announce (essentially) RMS
under Ultrix (and I'll bet a dollar someone is working on this.) Fine
idea, as long as it's not in the OS.

One problem with structured files that's easy to see is whether
information stored in the file to represent the structure is part of
the file or not.

For example, if in a variable length, blocked format you store the
length of each record as a preceding field of 16-bits, is the size of
the file the size of all its data + NRECORDS*2 (2 bytes)? Or just the
size of the file (that is, what does a file status query return?)

That doesn't seem terribly important at first (who cares, choose
a solution and stick to it) until one wants to access the thing
as a raw file (something always trivial to do in Unix's scheme.)

Now, is the 16-bit field counted in a file position seek? Can I safely
take two positions, POS1 and POS2 (byte offsets into the file, a la
ftell or lseek) and subtract them, perhaps then allocating and copying
the data? Or might the result be larger (OS adds in the 16-bit fields)
or even incorrect (POS2 should have been incremented by NRECORDS*2,
but I can't really calculate that number NRECORDS very easily, in
advance.)

I'm not sure I'm claiming that Unix solves any of this other than
laying things out so very barebones and w/o OS interpretation that
it's totally up to the user, no hand-to-hand combat with a record
management system required.

Anyhow, I may not be expressing myself very well, but I have used VMS
and IBM record access methods enough over the years to know that
sometimes they can drive you to tears (usually because the OS feels it
has a better idea of what you are doing than the programmer does,
and modifies or otherwise "corrects" your requests.)

What's far more important, in my experience, is to have an orderly set
of access methods and to use them only where they are truly justified
(ie. simply because it's faster is not a good enough excuse if 99% of
the actual applications will perform faster than human response time
with either method, naive or sophisticated.)

I remember, for example, when the VMS HELP files went from a very
simple, textual format to their current library format and it made
working with them in new and creative ways nearly impossible (I had
written a full-screen access to the VMS help files in TECO, no
kidding, which was nearly impossible to salvage, I never bothered.)
I'm not sure the changeover was really much of an improvement, sped up
something which was fast enough already and added a lot of complexity
where it was unappreciated, adding a new help topic became more
complicated etc.

Not a flame, just trying to emphasize my point about it's good to
have access methods, but it tends to lead people astray into using
them just to avoid scanning a file when the latter would perform
fine and would greatly simplify later maintenance (typically, the
file can be manipulated with a text editor) etc.

	-Barry Shein, ||Encore||

schwartz@shire (Scott Schwartz) (09/19/88)

In article <3442@crash.cts.com> jeh@crash.CTS.COM (Jamie Hanrahan) writes:
|I offer this challenge:  Take a simple Unix filter like
|DETAB running on some Unix system on a VAX (Ultrix, BSD, AT&T, whatever).
|Rewrite it to use record-oriented I/O under VMS.  ...
|We've done this and the VMS/RMS versions run *at least* twice as fast, 
|sometimes five or six times.

I've seen unix programs (things like a grep replacement) that got
similar speedups by replacing stdio calls with read/write and a large
buffer.  I wonder how much of that 2-6x is from overhead in stdio,
rather than in the filesystem.  

Here is some sample data:

/* f1.c: */

#include <stdio.h>
main()
{
	int c;	

	while ((c = getchar()) != EOF)
		putchar(c);
}

/* f3.c: */

#include <stdio.h>
main()
{
	int len;	
	char buffer[BUFSIZ*10];

	while (len = read(0, buffer, sizeof(buffer)))
		write(1, buffer, len);
}

/* test file */

shire% wc test
26388  100728 1292508 test

/* results */

shire% time f1 <test >foo
5.8u 0.7s 0:06 100% 0+224k 2+163io 0pf+0w

shire% time f3 <test >foo
0.0u 1.0s 0:01 63% 0+248k 0+160io 0pf+0w

Shire is a Sun 4 running SunOS 4.0.  I got similar results on a Vax 780
running 4.3 BSD (except that it took 10 times longer to run.)

-- Scott Schwartz     schwartz@gondor.cs.psu.edu

Your array may be without head or tail, yet it will be proof against defeat.  
   Sun Tzu, "The Art of War"

james@bigtex.uucp (James Van Artsdalen) (09/19/88)

In article <3438@crash.cts.com>, jeh@crash.CTS.COM (Jamie Hanrahan) wrote:

> I disagree.  I much prefer VMS's variable-length-record text file format
> to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
> legitimate data as a record separator.

In reading the write(2) man page, I somehow completely missed the
discussion of file record separators in unix.
-- 
James R. Van Artsdalen    ...!uunet!utastro!bigtex!james     "Live Free or Die"
Home: 512-346-2444 Work: 328-0282; 110 Wild Basin Rd. Ste #230, Austin TX 78746

gwyn@smoke.ARPA (Doug Gwyn ) (09/19/88)

In article <3951@psuvax1.cs.psu.edu> schwartz@shire.cs.psu.edu (Scott Schwartz) writes:
>In article <3442@crash.cts.com> jeh@crash.CTS.COM (Jamie Hanrahan) writes:
>|I offer this challenge:  Take a simple Unix filter like
>|DETAB running on some Unix system on a VAX (Ultrix, BSD, AT&T, whatever).
>|Rewrite it to use record-oriented I/O under VMS.  ...
>|We've done this and the VMS/RMS versions run *at least* twice as fast, 
>|sometimes five or six times.
>I've seen unix programs (things like a grep replacement) that got
>similar speedups by replacing stdio calls with read/write and a large
>buffer.  I wonder how much of that 2-6x is from overhead in stdio,
>rather than in the filesystem.  
>Here is some sample data:

The point is valid, although your two examples were not functionally
identical, since in one case you were inspecting EVERY character in
a file and in the other you never inspected ANY character.  User-mode
overhead from stdio tends to be comparable to system overhead for
typical applications, assuming a fairly good implementation of stdio.
Certainly it is a mistake to use stdio to implement "cat", for example
(for several reasons), but for most applications the additional
services provided by stdio (buffering, etc.) are useful, as is the
fact that the stdio functions are available on all systems whereas
open()/read()/etc. may not be (and when they are, their semantics are
not as well defined).

The analogous UNIX "challenge" would be:  Take a simple UNIX filter
(I have no idea where he gets "detab", which is not standard on UNIX)
and rewrite it to use direct system calls on UNIX...

Personally I think I have better things to do than crank out system-
specific code.

guy@gorodish.Sun.COM (Guy Harris) (09/19/88)

> I wouldn't be at all shocked to see DEC announce (essentially) RMS
> under Ultrix (and I'll bet a dollar someone is working on this.) Fine
> idea, as long as it's not in the OS.

Or, more precisely, not in a more-privileged mode than user mode; I consider
the OS to be more than just the kernel - for instance, I consider UNIX standard
I/O to be part of the OS.

Under RSX-11, if I remember correctly, RMS is just a library that runs in user
mode; VMS decided to fill another much-needed gap by running it in executive
mode.  Neither of them stuffed it into the kernel, at least....

guy@gorodish.Sun.COM (Guy Harris) (09/19/88)

> I disagree.  I much prefer VMS's variable-length-record text file format
> to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
> legitimate data as a record separator.  To make matters worse, the standard
> C method for dealing with strings uses a *different* character as a string
> terminator!  Unix has a lot of GREAT ideas in it, but this isn't one of them.

Umm, as others have already pointed out, UNIX doesn't use '\n' as a record
separator; it uses it as a *line* separator.  UNIX - like VMS - ultimately (at
the kernel level) implements files as a sequence of bytes (RMS sits on top of
QIOs that read virtual blocks of the file, *n'est ce pas?*).

One file format UNIX happens to implement atop this abstraction is the "text
file"; "text files" consist of "lines", which are sequences of bytes (not
containing '\0' - some applications can't handle them, since it's the C string
terminator) ending with '\n'.

Other file formats exist, such as executable images and archives, which are,
respectively, the UNIX equivalents of images (and object files - object files
and images use the same format) and library files.

However, UNIX doesn't come standard with any libraries that implement "record"
files.  Such libraries are available from third-party vendors (e.g., C-ISAM),
and I very much doubt that they use '\n' or any other particular byte value as
a record separator.

Some of the real differences between UNIX and VMS here are that:

	1) As already stated, VMS comes with libraries that implement "record"
	   files, while UNIX doesn't;

	2) Many UNIX utilities (e.g., "cp") deal with files at the byte-stream
	   level, so they don't care *what* format the file is in;

	3) Many more UNIX facilities use text files, rather than record files,
	   as their underlying file format; while one reason for this may be
	   the absence of a "record file" library, another reason is that you
	   can use the standard UNIX text file tools to manipulate those files.

mike@turing.unm.edu (Michael I. Bushnell) (09/19/88)

In article <68850@sun.uucp>, guy@gorodish (Guy Harris) writes:
>> I wouldn't be at all shocked to see DEC announce (essentially) RMS
>> under Ultrix (and I'll bet a dollar someone is working on this.) Fine
>> idea, as long as it's not in the OS.
>
>Or, more precisely, not in a more-privileged mode than user mode; I consider
>the OS to be more than just the kernel - for instance, I consider UNIX standard
>I/O to be part of the OS.

But standard I/O runs in user mode, not in a more-priviledged mode.  What
you consider the OS to be is not what it in fact is.  A good working 
description of OS is that part of the system which the arbitrary user
cannot rewrite and use in lieu of the distributed code.  You can rewrite
stdio, and then not use the distributed one.  This definition is
*very* closely linked to what privilege mode the code runs in...if it
runs in user mode, the user could replace it.

>Under RSX-11, if I remember correctly, RMS is just a library that runs in user
>mode; VMS decided to fill another much-needed gap by running it in executive
>mode.  Neither of them stuffed it into the kernel, at least....

But...the user can't necessarily replace RMS without getting to write
his own CHME dispatch table, something the kernel is not likely to let
him do.
-- 
-- 
                N u m q u a m   G l o r i a   D e o 

       \                Michael I. Bushnell
        \               HASA - "A" division
        /\              mike@turing.unm.edu
       /  \ {ucbvax,gatech}!unmvax!turing.unm.edu!mike

mike@turing.unm.edu (Michael I. Bushnell) (09/19/88)

In article <68855@sun.uucp>, guy@gorodish (Guy Harris) writes:
>> I disagree.  I much prefer VMS's variable-length-record text file format
>> to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
>> legitimate data as a record separator.  To make matters worse, the standard
>> C method for dealing with strings uses a *different* character as a string
>> terminator!  Unix has a lot of GREAT ideas in it, but this isn't one of them.
>
>Umm, as others have already pointed out, UNIX doesn't use '\n' as a record
>separator; it uses it as a *line* separator.  UNIX - like VMS - ultimately (at
>the kernel level) implements files as a sequence of bytes (RMS sits on top of
>QIOs that read virtual blocks of the file, *n'est ce pas?*).
>
>One file format UNIX happens to implement atop this abstraction is the "text
>file"; "text files" consist of "lines", which are sequences of bytes (not
>containing '\0' - some applications can't handle them, since it's the C string
>terminator) ending with '\n'.
>
>Other file formats exist, such as executable images and archives, which are,
>respectively, the UNIX equivalents of images (and object files - object files
>and images use the same format) and library files.

But a very important thing to remember is this:  The designers of UNIX
didn't expect to see people edit binaries, but they stuck with the
byte-stream abstraction.  Programs that are willing to stick to it
(like GNU emacs, and unlink ed, ex, and vi) can benifit tremendously.
I can and do edit binaries using emacs.  It didn't take *any*
modification of the operating system to do this, and emacs didn't
require *any* special modifications to do so...all it needed was to
learn how *not* to use separators. 

The point is that while you might not see the value in it now, you
might later, when it is too late.  Try using your favorite VMS editor
to edit a binary and change a string constant!  Not too likely, I'm afraid.
-- 
-- 
                N u m q u a m   G l o r i a   D e o 

       \                Michael I. Bushnell
        \               HASA - "A" division
        /\              mike@turing.unm.edu
       /  \ {ucbvax,gatech}!unmvax!turing.unm.edu!mike

sommar@enea.se (Erland Sommarskog) (09/20/88)

Jamie Hanrahan (jeh@crash.CTS.COM) writes:
>I know, I know -- for many applications stream I/O makes for much cleaner
>program design.  But for others, it doesn't, at least not when you have
>good alternatives available.  

I don't think one should over-emphasize the importance of what I/O-
concept the OS uses. If I program in an high-level langauge it is
rather the I/O-concept of that language which is of interest. At 
least if I/O is well-defined. In many modern langauges, I/O is not 
part of the langauge, but rather a library which could be more or
standardized. What is left is of course the question of efficiency.

So if the langauge like C only has stream I/O (I assume it is so, 
I don't speak C, so I could be wrong) then we don't benefit from 
a complex file system when all we want is simple streams.

Ada, on the other hand, has text files, and record files both for
sequential and direct access. For the compiler-writer it may be
of interest if the file system supports the appropriate formats,
for me as a programmer it does not. Whether it's in the file system
or the RTL doesn't matter.
  Jamie Hanrahan complained that stream I/O meant that in-band data
were used as a terminator. In practice this mean writing an LF in 
the middle of a text line is impossible in Unix, while is quite OK
in VMS. (Which on the other hand impose a maximum length on the line.)
  So what about Ada? If I write an LF character the result will be
different on VMS and Unix? Non-portable? Yes, but the manual also
clearly says that I/O of non-printable characters is not defined
by the language.

-- 
Erland Sommarskog            ! "Hon ligger med min b{ste v{n, 
ENEA Data, Stockholm         !  jag v}gar inte sova l{ngre", Orup
sommar@enea.UUCP             ! ("She's making love with best friend,
                             !   I dare not to sleep anymore")

jeh@crash.cts.com (Jamie Hanrahan) (09/20/88)

No, this isn't a followup rebuttal, even though I've been beat up pretty 
badly re. my statement about "record separators" (okay, okay, "line 
separators") in Unix files.  I said my piece already, right?  

But I was annoyed to see someone say "Please, don't start another
Unix vs. VMS war".  I don't think this is a "war" at all.  I think I've
learned a bit about the right way to think about Unix files, knowledge
which will no doubt come in handy some day, probably sooner than I think it
will (if past experience is any guide).  Maybe some other folks have learned
something about VMS files too.  Isn't this what the net is about?  (But if
someone says "Please don't let this get out of hand", I'll second.)

jfh@rpp386.Dallas.TX.US (The Beach Bum) (09/20/88)

In article <68855@sun.uucp> guy@gorodish.Sun.COM (Guy Harris) writes:
>Umm, as others have already pointed out, UNIX doesn't use '\n' as a record
>separator; it uses it as a *line* separator.  UNIX - like VMS - ultimately (at
>the kernel level) implements files as a sequence of bytes (RMS sits on top of
>QIOs that read virtual blocks of the file, *n'est ce pas?*).

vms has file attributes directly associated with the file.  qio does
read virtual blocks - but you can't easily convince rms to read a file
in some mode other than the mode the file was created with.  if you
have an isam file you want to read as a 80 character fixed length record
file, it's qio or nothing [ but grief ]
-- 
John F. Haugh II (jfh@rpp386.Dallas.TX.US)                   HASA, "S" Division

    "If the code and the comments disagree, then both are probably wrong."
                -- Norm Schryer

eric@snark.UUCP (Eric S. Raymond) (09/20/88)

In article <13608@mimsy.uucp>, chris@mimsy.UUCP (Chris Torek) writes:
> (Henry Spencer and Geoff Collyer rewrote the B news software and got
> a similar order of magnitude performance increase, without changing
> the file formats at all.)

And I did likewise, with similar results, for B3.0. Chris is, as usual, quite
correct; the fault lies not in our file formats, but in our code. The major
win was just eliminating the fork-per-article overhead in the unbatcher.

The principle exemplified here bears repeating yet again:

	A CLEAN DESIGN IS THE ROYAL ROAD TO SPEEDY CODE

and fiddling with flat-vs-ISAM files, clever code hacks or other 'micro-level'
optimizations is usually a recipe for lots of pain with very little gain.

-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718

guy@gorodish.Sun.COM (Guy Harris) (09/21/88)

> vms has file attributes directly associated with the file.  qio does
> read virtual blocks - but you can't easily convince rms to read a file
> in some mode other than the mode the file was created with.

As I remember, the VMS file attributes are maintained, but not really used, by
the code that I would refer to as the VMS file system (the ACPs or "extended
QIO processors" or whatever they call the new stuff they added in recent
versions).  I think there are QIOs (perhaps undocumented) that RMS uses to
fetch and store those attributes.

aperez@cvbnet2.UUCP (Arturo Perez Ext.) (09/21/88)

From article <68855@sun.uucp>, by guy@gorodish.Sun.COM (Guy Harris):
>> I disagree.  I much prefer VMS's variable-length-record text file format
>> to Unix's byte-stream.  Why?  Because the Unix byte stream uses perfectly
>> legitimate data as a record separator.  To make matters worse, the standard
>> C method for dealing with strings uses a *different* character as a string
>> terminator!  Unix has a lot of GREAT ideas in it, but this isn't one of them.
> 
> One file format UNIX happens to implement atop this abstraction is the "text
> file"; "text files" consist of "lines", which are sequences of bytes (not
> containing '\0' - some applications can't handle them, since it's the C string
> terminator) ending with '\n'.
> 
> Other file formats exist, such as executable images and archives, which are,
> respectively, the UNIX equivalents of images (and object files - object files
> and images use the same format) and library files.
> 
> However, UNIX doesn't come standard with any libraries that implement "record"
> files.  Such libraries are available from third-party vendors (e.g., C-ISAM),
> and I very much doubt that they use '\n' or any other particular byte value as
> a record separator.
> 

I'm curious.  I understand VMS's supposed need for the various file formats.
And although I disagree, that's DEC decision; let them live with it.  They just
want application designers to use the tools that DEC designed.  Maybe because
it makes their software easier to support.  I don't really know.  And I don't
really work with VMS often enough to really care.

But I do know from experience that the Unix file system is so straightforward
that ANYBODY can use it without having to worry about the millions of 
descriptors that are needed to set up an I/O request on RMS. 


What I'm curious about is the fact that I've never heard of any record
access libraries for Unix.  I know that I've written simpleminded record
access applications.  I'm sure other people have as well.  Is there anyone
actually selling record access libraries for the Unix community?  If not
why isn't anyone doing it?


Arturo Perez
ComputerVision, a division of Prime
primerd!cvbnet!aperez
The difference between genius and idiocy is that genius has its limits.

gwyn@smoke.ARPA (Doug Gwyn ) (09/21/88)

In article <3954@enea.se> sommar@enea.se (Erland Sommarskog) writes:
>In practice this mean writing an LF in the middle of a text line is
>impossible in Unix, while is quite OK in VMS.

On the other hand, what is a "text line" that occupies portions of
multiple lines on a display device?  Change "text line" to "text
record" and the concept makes more sense, but then why is text
necessarily organized into records, and why do these records look
like they do instead of something like

x T aps
x res 723 1 1
x init
x font 1 R
x font 2 I
x font 3 B
x font 4 H
x font 5 CW
x font 6 S
x font 7 S1
x font 8 GR
V0
p1
s10
f1
H696
V480
h2075c-
35 33152 33-n120 0
H696
V960
cT
67h54i28sw71i28sw71a50nw86e45x50a50m82p52l28e45.n120 0
x trailer
V7953
x stop

jeremy@chook.ua.oz (Jeremy Webber) (09/21/88)

In all this discussion I have not seen mention of the fact that you can open a
VMS file for block i/o and then treat it as a stream of blocks.  This can be
useful for just moving data around.  It can also be dangerous, but no more so
than treating a file as a stream of bytes.

One thing that I think DEC stuffed up badly though is that they did not define
a standard for text files.  Instead, you have variable-length-carriage-control,
Fortran carriage control, List carriage control, stream-LF, stream-CR and
probably half a dozen others that I have not thought about.  This makes writing
text file manipulation programs, such as text editors, a real pain.  It also
makes manipulation of text by programs written in different languages
hazardous.  I believe that DEC should modify the run time libraries of all
languages to convert internal text to and from a standard text form when
reading and writing files.

I can see the performance advantages of letting the file system "know" about
RMS.  Particularly with regard to record locking and other commercial uses.

In short, there are advantages and disadantages in the VMS as against the UNIX
method of treating files, and you'll probably choose the one best for your
application.

-Jeremy Webber (jeremy@chook.ua.oz.au)
Computer Science, Adelaide University, Australia

"One of these days I'll get around to writing a .signature file"

meo@stiatl.UUCP (Miles O'Neal) (09/21/88)

In article <68855@sun.uucp>, guy@gorodish.Sun.COM (Guy Harris) writes:
> 	3) Many more UNIX facilities use text files, rather than record files,
> 	   as their underlying file format; while one reason for this may be
> 	   the absence of a "record file" library, another reason is that you
> 	   can use the standard UNIX text file tools to manipulate those files.

If you even have your data files as text files, debugging
becomes much easier. For instance, would you rather debug

98764389437034gh307ytfhr398f39

or

12/22/88 01:30 10790 100 100 382 -1

?
These are not real data, but examples of what data files I've dealt
with looked like. The processing to do all this is cheap nowdays,
so why not use text files if there is no OVERWHELMING reason not to?

Another thing this buys you is that, in my experience, its easier
to change file formats if you use text files. It requires a little
plannning, but in general is a lot less work than doing the same
thing with any other type of data.

Strangely enough, you can do similar things with VMS, OS/32, or
even CP/M...

dave@arnold.UUCP (Dave Arnold) (09/22/88)

eric@snark.UUCP (Eric S. Raymond) writes:
> In article <13608@mimsy.uucp>, chris@mimsy.UUCP (Chris Torek) writes:
> > (Henry Spencer and Geoff Collyer rewrote the B news software and got
> > a similar order of magnitude performance increase, without changing
> > the file formats at all.)
> 
> [...]
> 
> The principle exemplified here bears repeating yet again:
> 
> 	A CLEAN DESIGN IS THE ROYAL ROAD TO SPEEDY CODE
> 

I couldn't agree any more.  People I work with seem to get bogged down
in the "How big of a QIO can I do" syndrome during early early program
design and development.  I really protest this (especially when they
encourage me to do the same).  One of the reasons why I am a *GREAT*
:-) programmer...is...because...: I much prefer to view things in the
most simple way.  I actually go to great effort rewriting things
(with my bosses glare $$$) just to acheive a simpler program design.
Sometimes the rewrite achieves better performance (not intentionally).
And if not, facilitates easier performance enhancements---But I save
those for last.

This is the thing that I love about UNIX so much that I wish VMS
shared: SIMPLICITY.  Everything is so damn simple, it goes right
over some people's head.  Now if UNIX only had AST's, timer queues,
exception handling, and a better "SHELL"---I would be in heaven.

Remember the days when we would bring monolithic
straight-line code to bed with us, and make marks on the listing?

I even remember back in the late 1970's my boss teaching me the
cons of structured programming by explaining to me that a function
call just turns into a JMP instruction :-)  This is the 80's!!!
Soon to be 90's!! Let's not get stuck in the dark ages!
-- 
Dave Arnold
dave@arnold.UUCP	{cci632|uunet}!ccicpg!arnold!dave

eric@snark.UUCP (Eric S. Raymond) (09/22/88)

In article <3453@crash.cts.com>, jeh@crash.CTS.COM (Jamie Hanrahan) writes:
>                 I don't think this is a "war" at all.  I think I've
> learned a bit about the right way to think about Unix files, knowledge
> which will no doubt come in handy some day, probably sooner than I think it
> will (if past experience is any guide).  Maybe some other folks have learned
> something about VMS files too.  Isn't this what the net is about?

Yup. Me, I learned a lot about VMS from your postings. Not that I'd ever use
it without you put a gun to my head, but I learned a lot. Thank you for your
lucid descriptions of how RMS works.

BTW, cultural differences are funny; I kept wanting to parse that acronym RMS
as "Richard M. Stallman", an entity even more complex and obscure (but much
less brain-damaged :-)) than VMS file I/O.
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      UUCP: ...!{uunet,att,rutgers}!snark!eric = eric@snark.UUCP
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718

allbery@ncoast.UUCP (Brandon S. Allbery) (09/23/88)

As quoted from <179@arnold.UUCP> by dave@arnold.UUCP (Dave Arnold):
+---------------
| In article <3597@encore.UUCP>, bzs@encore.UUCP (Barry Shein) writes:
| > The problem with the Unix "unstructured" approach is that either you
| > use some of the (very few) library routines (dbm is a major one, so
| > are the object deck readers in SYSV) or you roll your own, each
| > application will have its own way of storing data (compare termcap
| > with passwd with inittab with crontab with ...) often not terribly
| > well documented or efficient (agreed, often efficiency is a poor
| > excuse for obscurity.)
| 
| This is not a problem.  It's not often that your application requires
| you to "Roll your own".  And you get a very simple filesystem.
+---------------

This all ties together with the terminfo-vs.-termcap discussion.  Actually, I
have written an interpreted terminfo (as part of the "tgraph" compatibility
package for SVR2 curses); it is slow, but that's mainly because of laziness.
It should be quite possible to write it to work quickly, with the same
longer name usage *but* *extensible* unlike terminfo.

Just as byte-stream file systems are more general and more useful than typed
file systems, simple, general, FAST "access method" routines on top of the
stream file systems are better than either typed file systems or roll-your-
own access methods.  (Example:  COFF, or the new format perhaps, could
easily be generalized to make a "resource library file" similar to Macintosh
resource forks.  Which would make "ld" a general utility rather than just an
object relocation editor.)

Termcap's obscurity and outright bugs (skip a backslash or expand a tab to
spaces and the whole file goes to pot) make it a rather bad access method;
while fixed versions (such as the Gnu version) handle the bugs, it's still
harder to understand those two-character capnames than terminfo capnames.
The interpretive terminfo-style reader is a step in the right direction.  I
also have a terminfo-like routine (currently implemented via yacc, so it's
REALLY slow) which supports typed arrays.

On the other hand, termcap/info doesn't solve all problems; it's senseless
to complain about termcap and passwd not having the same format, they're
keyed and used differently.  Passwd uses yet another SIMPLE, GENERAL format,
which is easily manipulated even at the shell level.  Crontab is actually a
simple variant of that format, and perhaps should be merged, but the
existing tools can very easily deal with both.  (After all, there's really a
difference only in that a colon is used as passwd's field separator, while
crontab uses a tab.  Interpretation of fields varies, but that's going to
happen anyway in a real-world database situation.)

++Brandon
-- 
Brandon S. Allbery, uunet!marque!ncoast!allbery			DELPHI: ALLBERY
	    For comp.sources.misc send mail to ncoast!sources-misc
"Don't discount flying pigs before you have good air defense." -- jvh@clinet.FI

bzs@xenna (Barry Shein) (09/25/88)

If I can be permitted to summarize this discussion:

VMS's RMS can be useful in many situations and amounts to an added
application library bundled in with VMS which Unix folks would have to
go out and purchase separately (I've seen similar libraries for Unix
advertised in trade mags, they do exist.) Presumably one can add a
similarly useful access methods library to Unix, the biggest question
being the desirability of true asynchronous I/O (it's possible that,
from a pure performance standpoint, Unix wouldn't benefit that much
from this due to its buffer cache although some would still like it.)

VMS's biggest drawback, in regards RMS, is that there wasn't much more
discipline on the part of the applications designers to use
(preferably) one access method for most applications so utilities
could work together more smoothly. Having one utility produce a text
file which cannot be read in and manipulated by another seems to
violate "the law of least astonishment" in a major way. Simply
handling all the permutations is not as reliable as agreeing on one
format except where carefully justified. This is particularly true
when changing between programming languages (at least one reader
claims this.)

I think it's safe to say this was a constructive discussion.

	-Barry Shein, ||Encore||

mazumdar@fredonia.UUCP (Jin Mazumdar) (09/29/88)

	
	I have just been browsing through this discussion and have not
read all follow ups. Although UNIX does not have fixed length
records can one not convert any file in UNIX to fixed length records
using the dd utility?  On the other hand on fixed format systems the
best you could do is fake variable format with an end of record marker
and possibly wasting the rest of the record.

   Jin Mazumdar (uucp:) ...decvax!sunybcs!fredonia!mazumdar          
   >>>  The following are for historical interest only  <<<
   Dept. Of Math and C. S.     
   State University of New York College at Fredonia     
   Fredonia, N.Y. 14063         (716) 673 3459                               
 

dhesi@bsu-cs.UUCP (Rahul Dhesi) (09/29/88)

In article <1127@fredonia.UUCP> mazumdar@fredonia.UUCP (Jin Mazumdar) writes:
>Although UNIX does not have fixed length
>records...

It certainly does.  Look at the structure of /etc/utmp and /usr/adm/wtmp
or equivalent files on your system.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee}!bsu-cs!dhesi

jfh@rpp386.Dallas.TX.US (The Beach Bum) (09/30/88)

In article <4136@bsu-cs.UUCP> dhesi@bsu-cs.UUCP (Rahul Dhesi) writes:
>In article <1127@fredonia.UUCP> mazumdar@fredonia.UUCP (Jin Mazumdar) writes:
>>Although UNIX does not have fixed length
>>records...
>
>It certainly does.  Look at the structure of /etc/utmp and /usr/adm/wtmp
>or equivalent files on your system.

not in the typical sense.  there is no file-system level support for
fixed length records.  unix files are byte streams, meaning [ with the
exception of certain device files ] you can read 1 byte or, hardware
permitting, 1MB.

with other operating systems the size of the record is fixed at file
creation time and may not be changed without copying the contents of
the file using a file conversion utility of some type.  /etc/utmp may
be read one byte at a time, except that the "records" would not have
any meaning.


-- 
John F. Haugh II (jfh@rpp386.Dallas.TX.US)                   HASA, "S" Division

      "Why waste negative entropy on comments, when you could use the same
                   entropy to create bugs instead?" -- Steve Elias

allbery@ncoast.UUCP (Brandon S. Allbery) (10/07/88)

As quoted from <4136@bsu-cs.UUCP> by dhesi@bsu-cs.UUCP (Rahul Dhesi):
+---------------
| In article <1127@fredonia.UUCP> mazumdar@fredonia.UUCP (Jin Mazumdar) writes:
| >Although UNIX does not have fixed length
| >records...
| 
| It certainly does.  Look at the structure of /etc/utmp and /usr/adm/wtmp
| or equivalent files on your system.
+---------------

The programs that use those files use fixed-length "records"; the file system
itself does not enforce them, however.  The difference is that you don't have
to tell your favorite binary editor that it must open /etc/utmp with a record
size of (sizeof (struct utmp)) bytes.

++Brandon
-- 
Brandon S. Allbery, uunet!marque!ncoast!allbery			DELPHI: ALLBERY
	  For comp.sources.misc send mail to <backbone>!sources-misc
comp.sources.misc is moving off ncoast -- please do NOT send submissions direct
	  "So many articles, so little time...."  -- The Line-Eater