[net.unix-wizards] more on file \"attributes\"

jcampbell@mrfort.DEC (Jon Campbell) (08/01/85)

Well, my mailbox runneth over with mail telling me how I've struck
at the heart of UNIX by suggesting file attributes. I think perhaps
I have presented the problem (and its possible solution) in the wrong
light.
 
What many users have suggested is that I put a "file header" at the
beginning of each file. This seems like a reasonable approach, except
that existing FORTRANs do not put such cruft at the beginning of files
now. So we have a skew problem. What I was suggesting, though it might
have not been clear, is an "invisible" file header, one which you look
at in a slightly different way than the real data (the bytes in the file).
Perhaps this could be by using a negative byte address in the file, perhaps
some other way. I'm not particularly interested in the way it might be done,
except that it cannot be part of the actual data and it cannot be a separate
file.
 
There are many such operating systems (which have file information in
invisible or hidden headers) around, such as the ATEX text-processing
system used in many newspapers. Ordinary programs and utilities need
not ever look at the invisible header if they are interested in the
data only.
 
I suggested that it be part of the "file information block" (i.e.,
the filename, creation date, and size) because that is a convenient
way to have it copied transparently when you make a copy of the file
or rename the file.
 
I am not suggesting changing the way that the vast majority of UNIX
utilities and user programs currently look at files, nor suggesting
any changes to them. I am suggesting that we give a data-handle, if
you will, for those programs and utilities which care to use the
"attributes". There is no loss of performance, no restrictions placed
on file usage, and very small extra disk space used.
 
I think that you folks who are having a look at creating UNIX utilities
which can do serious data manipulation, read magtapes from "foreign"
operating systems and munge it (without having to read the ANSI
magtape header files by hand), or write utilities which can look at
different files without knowing a priori the file format, will
recognize the problem that I am trying to address. I am not trying
to "strike at the heart" of UNIX; I am letting you know that there
is a problem to be solved that cannot be solved easily.
 
Thanks for all of your feedback. I am looking forward for more.
 
					Thanks,
					Jon Campbell
   --------

levy@ttrdc.UUCP (Daniel R. Levy) (08/03/85)

jcampbell@mrfort.DEC (Jon Campbell) <3398@decwrl.UUCP>:

>What many users have suggested is that I put a "file header" at the
>beginning of each file. This seems like a reasonable approach, except
>that existing FORTRANs do not put such cruft at the beginning of files
>now. So we have a skew problem. What I was suggesting, though it might
>have not been clear, is an "invisible" file header, one which you look
>at in a slightly different way than the real data (the bytes in the file).
>Perhaps this could be by using a negative byte address in the file, perhaps
>some other way. I'm not particularly interested in the way it might be done,
>except that it cannot be part of the actual data and it cannot be a separate
>file.

>There are many such operating systems (which have file information in
>invisible or hidden headers) around, such as the ATEX text-processing
 
>I suggested that it be part of the "file information block" (i.e.,
>the filename, creation date, and size) because that is a convenient
>way to have it copied transparently when you make a copy of the file
>or rename the file.
> 
>I am not suggesting changing the way that the vast majority of UNIX
>utilities and user programs currently look at files, nor suggesting
>any changes to them. I am suggesting that we give a data-handle, if
>you will, for those programs and utilities which care to use the
>"attributes". There is no loss of performance, no restrictions placed
>on file usage, and very small extra disk space used.
> 

Some comments.  One of the beauties of Unix files (the ordinary kind) is that
you can deal with them without having to worry about anything EXCEPT the "act-
ual data" in them.  ANY scheme which uses files which contain information which
is out of the "actual data" is going to add new requirements to programs which
manipulate files.  This has been beaten to death on the net but is a very valid
point.  Special features not in the actual data would be lost the moment the
file was manipulated by a "dumb" command, like "cat" or "vi" or "ed" or
(name your favorite command).  All "cp" does now is to read in the data of its
input files, character by character, and spits it out to the output files.
It would have to be more sophisticated, like VMS COPY, to know that it should
open a file with special attributes.  This would be slower running.  cp does
not now copy the time of creation of the first file to the second one, also,
so your assertion above (that the file creation time is transferred from the
original to the copied file) is mistaken.  For that matter, the file size is
not directly transferred either (it is determined by the data which
is actually copied, and under ordinary circumstances it is no surprise that
it is the same as for the original file).

Skews against other FORTRANs?  While I admit that a header would generate a
skew against the present "f77", remember any attempt to transfer files with
attributes from one operating system to another has got to deal up front with
those attributes.  What about, for instance, trying to transfer VMS files with
special Fortran carriage control attributes to an IBM system?  Even disregar-
ding the ASCII-EBCDIC skew, the interface program is expected to know about
the special quirks of files on both systems.  The header in a Unix Fortran out-
put file would just be treated as another such quirk when transferred to an-
other operating system.

It probably would be a good idea, however, to only add headers to output files
which are intended to contain a special attribute, such as carriage control or
binary files.  And of course it would be desirable to determine what kind of
file is being written into (it would be silly to put a header out to a terminal
or a pipe, so detection of that kind of redirection on a logical unit would
cause carriage control attributes to be translated into the standard ASCII
control characters in that case).

There seems to be no easy answer.  I stand firm with those, however, who say
don't tamper with the Unix way of storing files.  At least 99.99% of all the
programs running on Unix machines will be "c" programs, which don't need the
overhead of having to account for the special "fortran" stuff whenever they
process files.  (Sorry, pascal and lisp programmers, I should lump you in
with the "c" people.)

Enough rambling for now.  I doubt if I've said anything anyone else hasn't.
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer, my pets, my plants, my boss, or the
| at&t computer systems division |  s.a. of any computer upon which I may hack.
|        skokie, illinois        |
|          "go for it"           |  Path: ..!ihnp4!ttrdc!levy
 --------------------------------     or: ..!ihnp4!iheds!ttbcad!levy

throopw@rtp47.UUCP (Wayne Throop) (08/04/85)

> [Using fortran-specific headers] seems like a reasonable approach,
> except that existing FORTRANs do not put such cruft at the beginning of
> files now. [...] What I was suggesting, though it might have not been
> clear, is an "invisible" file header, one which you look at in a
> slightly different way than the real data (the bytes in the file).

Now let me get this straight.  Putting formatting information in the
file itself is a good idea, but the lack of FORTRAN systems that do this
today causes you to suggest adding an independant data area to all files
to keep formatting information.

*Now* I see the light!  All those existing FORTRANs that use Unix
"invisible file headers"!  How silly of me.

(In case it isn't clear, I think adding an "invisible file header" to
 Unix is a Bad Thing.)
-- 
Wayne Throop at Data General, RTP, NC
<the-known-world>!mcnc!rti-sel!rtp47!throopw

sean@ukma.UUCP (Sean Casey) (08/05/85)

In article <3398@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes:
>What many users have suggested is that I put a "file header" at the
>beginning of each file. This seems like a reasonable approach, except
>that existing FORTRANs do not put such cruft at the beginning of files
>now. So we have a skew problem. What I was suggesting, though it might
>have not been clear, is an "invisible" file header, one which you look
>at in a slightly different way than the real data (the bytes in the file).

Why not just rewrite part of the fortran I/O library, instead of
rewriting the whole file system, the backup programs, etc.?  Why would
there be problems with that?  Sounds to me like it would be easier, and
save a lot of bucks in man-hours and bug fixing.  Not only that, but if
you did implement your "invisible" file header, you'd have rewrite part
of the I/O library anyway.  Geez, why go to all that trouble?


-- 

-  Sean Casey				UUCP:	sean@ukma.UUCP   or
-  Department of Mathematics			{cbosgd,anlams,hasmed}!ukma!sean
-  University of Kentucky		ARPA:	ukma!sean@ANL-MCS.ARPA	

hedrick@topaz.ARPA (Chuck Hedrick) (08/05/85)

In article <3398@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes:
>Well, my mailbox runneth over with mail telling me how I've struck
>at the heart of UNIX by suggesting file attributes. I think perhaps
>I have presented the problem (and its possible solution) in the wrong
>light.
> 
>What many users have suggested is that I put a "file header" at the
>beginning of each file. This seems like a reasonable approach, except
>that existing FORTRANs do not put such cruft at the beginning of files
>now. So we have a skew problem. 

I would much rather have a skew problem between f77 and your Fortran
than between Unix and your new OS (whatever you choose to call it).
Even if you hid your attributes somewhere, it is unlikely that your
new Fortran would be compatible with the old.  One assumes that new
programs would no longer specify the file attributes in OPEN.  (I
mean, why bother adding all these attributes in the file if the user
is still going to have to specify them in his program.)  A program
that failed to specify the attributes could no longer work when
compiled under the old Fortran.  Futhermore, it is likely that your
new Fortran would support a wider variety of file and record
organizations than the old one.  So in fact the amount of
compatibility that would be possible is minimal.  For the cases where
it is possible, one could have a utility that strips or adds the
headers.  You could also have an OPEN parameter that said not to
create a header for new files, and your Fortran could be trained to
read files that did not have a header, as longer as the user specified
enough of the attributes explicitly in OPEN.

Let's look at some of the costs of making the change that you propose:

1) You are going to have to hide the information either in the inode,
or in some sort of negative file address.  Changing the file system to
allow this will make it incompatible with other Unix file systems.
Probably you will want to put it in a negative file address, or a
special block accessible only through some special system call Adding
it to the inode would cause the size of the inode to increase, thus
penalizing everyone (in disk space used) even if they don't use the
feature.

2) You will have to change the dump, restore, and tar to save this
information on tapes and restore it.  I trust you want your attributes
to go onto backup tapes (dump and restore) and for it to be possible
to move this information with a file when you take the file to a
different system (tar).  A new switch will have to be added to tar
to allow you to suppress this information, in order to avoid confusing
systems that don't understand it.  Tapes produced on your system would
in general not be compatible with other systems unless this switch
is used.

3) You will have to change cp, mv, and various other utilities.  Users
are accustomed to the fact that when they copy a file, the copy is
the same as the original.  This would no longer be the case if copying
stripped off attribute information.

4) The network protocols used for FTP and rcp would have to be
extended to allow attribute information to be sent over the net.  Many
sites depend upon use of the network to keep various systems in sync
with each other.  They would not be able to tolerate having attribute
information disappear when the file is moved.

5) Programs that depended upon having attributes would not work with
I/O over pipes or other kinds of streams.  This could be a serious
restriction for some kinds of program.  Indeed you would be breaking
the device-independence of your I/O system, since it would be
impractical to have hidden attributes for any device other than disk
or its equivalent.

These are fairly serious costs, in implementation time, compatibility
with other Unix systems, and increased complexity of the file system.
It is very unlikely that many (any?) customers would want to pay these
costs.  I think these costs are clearly larger than any Fortran
version skew, particularly since there are ways to get around the
Fortran skew.

You might be interested that the same issues came up on the DEC-20
when RMS was implemented there.  They chose to put the RMS file
attribute information in a header at the beginning of the file.  I
certainly don't know why that choice was made, but the costs of
modifying TOPS-20 would have been similar to those of modifying Unix
to support attributes.

I urge you to talk to some of us in person before proceeding with
anything like what you have proposed to do.  The opposition to your
suggestion is not based on misunderstanding, nor does it indicate a
lack of sympathy with your goals.  Any change to the simplicity of the
file model *does* strike at the heart of Unix.

preece@ccvaxa.UUCP (08/06/85)

> There seems to be no easy answer.  I stand firm with those, however,
> who say don't tamper with the Unix way of storing files.  At least
> 99.99% of all the programs running on Unix machines will be "c"
> programs, which don't need the overhead of having to account for the
> special "fortran" stuff whenever they process files.  (Sorry, pascal
> and lisp programmers, I should lump you in with the "c" people.)
----------
If you really think that 99.99% of all the programs running on Unix
machines will be "c" programs, you haven't dealt with the real world.
There are many, many, MANY sites out there that use almost all their
CPU time running things like SPICE and SPSS and a myriad of other
existing engineering, statistical, and scientific applications written
in (gasp) FORTRAN.

As to where header information should go, I agree that there's no easy
answer.  Putting it outside the file seems preferable to me, because I
think files should be homogeneous and because I hate to think about
random access needing to work around the header and because I like
making it truly invisible to uninformed processes.  This doesn't
violate "the Unix way of storing files" any more than the other
out-of-band information already kept about files.  Certain basic
tools need to know about it (like dump/restore), others simply lose
the data (like sort).  If a vendor chose to add this feature (or
if the standards committee chose to add it), either the existing
commands would grow switches to recognize the header data or new
commands would be added.  No big deal.

I kind of like the idea of a LISP-ish property list (or you could
think of it as a file environment, if you like) in which any arbitrary
(name, value) pairs could be stuck.  Then a getfileproperty call
would get a particular value.

The first rule, of course, has to be to not break existing Unix.

-- 
scott preece
gould/csd - urbana
ihnp4!uiucdcs!ccvaxa!preece

mjs@eagle.UUCP (M.J.Shannon) (08/08/85)

> As to where header information should go, I agree that there's no easy
> answer.  Putting it outside the file seems preferable to me, because I
> think files should be homogeneous and because I hate to think about
> random access needing to work around the header and because I like
> making it truly invisible to uninformed processes.

By and large, the only files in a UNIX system that are homogeneous are text
files.  Granted, these files probably account for the vast majority of files,
but what about object modules, libraries of same, and executables?  None of
these are homogeneous, but no utilities (like cat, cp, wc, ls, etc.) have any
trouble with them simply because they're not.

> This doesn't
> violate "the Unix way of storing files" any more than the other
> out-of-band information already kept about files.  Certain basic
> tools need to know about it (like dump/restore), others simply lose
> the data (like sort).

Lose the data?  You mean you'd prefer a copy of a file not to have any of the
attributes of the original?  That seems to me to forbid copying of executables
(the non-homogeneous header gets lost in the shuffle, preventing execution of
the copy).  Sorry, this *DOES* violate "the UNIX way of storing files".

> If a vendor chose to add this feature (or
> if the standards committee chose to add it), either the existing
> commands would grow switches to recognize the header data or new
> commands would be added.  No big deal.

No big deal?  Would you like to add major sections to *every* command that
manipulates files to figure out its attributes?  Or would you prefer to write N
versions of each of those commands?

> I kind of like the idea of a LISP-ish property list (or you could
> think of it as a file environment, if you like) in which any arbitrary
> (name, value) pairs could be stuck.  Then a getfileproperty call
> would get a particular value.

This sounds reasonable, but why specify that it isn't part of the file data
itself?  Most files that have headers on them have a fixed property list, with
the names (and types) defined in structures in /usr/include.  Before you
propose a new mechanism, you'll have to convince a whole host of people that
their software is broken.  Good luck.

> The first rule, of course, has to be to not break existing Unix.

But that seems to be the first rule broken when you propose that these
attributes be stored anywhere but in the data of the file.

> scott preece
-- 
	Marty Shannon
UUCP:	ihnp4!eagle!mjs
Phone:	+1 201 522 6063

Warped people are throwbacks from the days of the United Federation of Planets.