[net.unix-wizards] unix file system

jcampbell@mrfort.DEC (Jon Campbell) (07/25/85)

 
 
 
                                From: Jon Campbell
                                      Digital Equipment Corp.
                                      Marlboro, MA
                                      617-467-6876
                                      DECnode:MRFORT::JCAMPBELL
 
To:  UNIX developers and users
 
Subject:  problems with the UNIX file system
 
Some of us at Digital think we have found a basic problem with the  UNIX
file  system  for FORTRAN.  The problem is that there is no place to put
various kinds of information about  the  contents  of  the  file.   More
specifically:
 
    1.  The FORTRAN language requires that one be able to  have  "random
    access"   files,  with  a  fixed  "recordsize".   The  obvious  UNIX
    implementation is one which uses a fixed number  of  bytes  (perhaps
    even with a <newline> at the end) for each "record".  However, there
    is no way on UNIX that one can open such a file  and  find  out  the
    size  of  each  record.  Thus it is impossible to write a utility to
    look at, modify, or extract data from such a file without  the  user
    having previous knowledge about the file.
 
    2.  As you probably know, most FORTRAN output data files reserve the
    1st  character  position of each output line for a "FORTRAN carriage
    control  character".   When  the  file  is  printed  (or,  in   some
    circumstances,  typed)  these  control characters are supposed to be
    translated into corresponding vertical motion  characters  (such  as
    one  or  more line-feeds, a form-feed, a vertical tab, etc.) and the
    <newline> character at the end of the "record" is removed.
 
    So FORTRAN output files  are  "different"  than  other  files,  even
    though  you  cannot  tell  that  by looking at them - they just have
    "funny numbers" in the 1st character position of  each  line.   UNIX
    provides   a  utility  for  piping  the  FORTRAN  output  through  a
    translator module, so that the  vertical  motion  characters  appear
    directly  in  the  output  file.   But  often  that  is  not what is
    desirable.  Often one wants  to  leave  the  file  in  its  original
    ("FORTRAN  data  file")  state, modify it many weeks later, and then
    print it.  Again, as in the case above, the user must know that  the
    file  was  produced  by  a  FORTRAN  program and pipe it to a filter
    program on the way out to the printer or terminal.
 
    3.  The ANSI Magnetic Tape Label Standard  defines  a  set  of  file
    attributes  in the file labels which must be filled in when the tape
    is written.   Among  them  are  record  size  and  carriage  control
    (referred to in the Standard as "Form Control").
 
I would like to propose that UNIX users and  developers  begin  thinking
about  which  "file  attributes" (knowledge about the file that would be
useful to know for  generalized  programs  which  cannot  have  previous
knowledge  about  each  file)  would  be useful to attach to UNIX files.
Keep in mind that these "attributes" would NOT in any way  detract  from
the  simplicity of UNIX - one would not have to use them;  they would be
                                                                  Page 2
 
 
there only for those users who wish to carry information about the files
along  with  the  files.   Nor would files with attribute information be
looked at by UNIX in any way than they are looked at  now  -  they  just
have  some  more information about them that can be discovered when they
are opened.  No "file management layer"  is  implied  for  UNIX  by  the
creation of these "attributes".
 
We would not even have to make an "incompatible change" for the printing
of files with the "FORTRAN data file" attribute:  a new command could be
introduced to take the place of LPR for those users who wish the utility
to find out whether the attribute is set and print the file accordingly;
many people would probably continue to use LPR.
 
Below is a list of those "attributes" which I have found  useful  in  my
work  in  implementing  the  FORTRAN  runtime  library  for  TOPS-10 and
TOPS-20.  Many of them have been included  in  the  ANSI  Magnetic  Tape
Label Standard:
 
Carriage control
  FORTRAN - funny numbers in char position 1, translated on printing
  LIST - take just the contents of the "record", add a <newline>. This
        is for files which have no <newline> characters in them
  NONE - print the file as it appears (the default)
 
Character set (for those folks who want to have both EBCDIC and ASCII files)
 
Record format - (refer to the Tape Label Standard)
  Delimited - each record has a 4-character byte count in front of it
  Fixed - all records have the same length, with no terminators
  Undefined - the default - no implied record format
 
Record size (For "fixed" record format, the size of all records;
        for variable-length records, this is usually interpreted
        as the maximum record length - zero means "unknown"
        maximum record length)
 
File type (for "data management" programs...)
  Sequential (the default)
  Others (user-definable, for various flavors of other types
        of access, such as [ugh] indexed sequential, database, etc.)
 
Bytesize (for typesetting applications which use 16- or 32-bit
        character sets)
 
I'm sure you'll all think of others that would be useful.  Since I  have
not  looked  at  the  UNIX  internal file system much, I do not know how
difficult it would be to  find  a  place  to  attach  this  large  (and,
potentially,  expanding) set of attributes, or what the FOPEN (or other)
interface would look like to set/get the attribute values.
 
                                        Thanks for your time,
                                        Jon Campbell
   --------

faustus@ucbcad.UUCP (Wayne A. Christopher) (07/26/85)

> Some of us at Digital think we have found a basic problem with the  UNIX
> file  system  for FORTRAN.  The problem is that there is no place to put
> various kinds of information about  the  contents  of  the  file.   More
> specifically:

< lots of stuff>

> I'm sure you'll all think of others that would be useful.  Since I  have
> not  looked  at  the  UNIX  internal file system much, I do not know how
> difficult it would be to  find  a  place  to  attach  this  large  (and,
> potentially,  expanding) set of attributes, or what the FOPEN (or other)
> interface would look like to set/get the attribute values.

I don't see what is wrong with letting fortran and its utility programs
do all of this themselves. The problem is that UNIX is not a "fortran"
operating system, and unlike systems like VMS, it doesn't have a lot of
stuff for the benefit of fortran programmers. There is really no reasonable
way to put this into the filesysem itself without a lot of re-writing,
and I doubt many people think it is worth the trouble. The fact is that
fortran is a dying language, and it would be silly to make unix more
friendly to fortran at the expense of more trouble for people who use
modern languages.

	Wayne

alb@alice.UUCP (Adam L. Buchsbaum) (07/26/85)

One's spine shivers at the thought of putting utility/program/etc
support into the kernel itself.

ark@alice.UUCP (Andrew Koenig) (07/26/85)

> Some of us at Digital think we have found a basic problem with the  UNIX
> file  system  for FORTRAN.  The problem is that there is no place to put
> various kinds of information about  the  contents  of  the  file.

The place to put information about the contents of the file
is in the file itself.

If you are unmoved by that philosophical argument, consider this:

If you expand Unix files to include additional information that is
not really part of the file, will that information be copied automatically
if you use "cp" to copy the file?  Any answer causes problems.

If the answer is "no," the information isn't really useful.
If it is "yes," then you must rewrite "cp."  You must also rewrite
"cat," because I can copy a file by saying      cat a >b .  You will
find that you must also rewrite dozens of other commands, as well
as writing many new ones.

rcj@burl.UUCP (Curtis Jackson) (07/26/85)

In article <3287@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes:
>    1.  The FORTRAN language requires that one be able to  have  "random
>    access"   files,  with  a  fixed  "recordsize".   The  obvious  UNIX
>    implementation is one which uses a fixed number  of  bytes  (perhaps
>    even with a <newline> at the end) for each "record".  However, there
>    is no way on UNIX that one can open such a file  and  find  out  the
>    size  of  each  record.  Thus it is impossible to write a utility to
>    look at, modify, or extract data from such a file without  the  user
>    having previous knowledge about the file.
> 
So write a teeny-weeny little function that calls fopen first and then
looks at the first word (16 or 32 bits, I don't care) that will contain
the record size because the writing program put it there.

>    2.  As you probably know, most FORTRAN output data files reserve the
>    1st  character  position of each output line for a "FORTRAN carriage
>    control  character".   When  the  file  is  printed  (or,  in   some
>    circumstances,  typed)  these  control characters are supposed to be
>    translated into corresponding vertical motion  characters  (such  as
>    one  or  more line-feeds, a form-feed, a vertical tab, etc.) and the
>    <newline> character at the end of the "record" is removed.
> 
>    So FORTRAN output files  are  "different"  than  other  files,  even
>    though  you  cannot  tell  that  by looking at them - they just have
>    "funny numbers" in the 1st character position of  each  line.   UNIX
>    provides   a  utility  for  piping  the  FORTRAN  output  through  a
>    translator module, so that the  vertical  motion  characters  appear
>    directly  in  the  output  file.   But  often  that  is  not what is
>    desirable.  Often one wants  to  leave  the  file  in  its  original
>    ("FORTRAN  data  file")  state, modify it many weeks later, and then
>    print it.  Again, as in the case above, the user must know that  the
>    file  was  produced  by  a  FORTRAN  program and pipe it to a filter
>    program on the way out to the printer or terminal.
> 
I don't just go around randomly printing files without knowing what they
are, do you?  I often use the 'file' command to see what type of file
I am dealing with -- nroff input, ascii data, C source, etc.  I can
see a addition to the 'file' command to recognize the types of FORTRAN
output files you are talking about, but nothing more.  "...the user must
know that the file was produced by a FORTRAN program..." -- the FORTRAN
user has to know that a file is random access before trying to open it
as such; what is the big deal?

>    3.  The ANSI Magnetic Tape Label Standard  defines  a  set  of  file
>    attributes  in the file labels which must be filled in when the tape
>    is written.   Among  them  are  record  size  and  carriage  control
>    (referred to in the Standard as "Form Control").
> 
And if I get a tape in 'tar' format I don't run cpio on it to extract it;
so if I get a tape that I know has byte-for-byte data on it I don't try
to read and process record size or carriage control fields.

>File type (for "data management" programs...)
>  Sequential (the default)
>  Others (user-definable, for various flavors of other types
>        of access, such as [ugh] indexed sequential, database, etc.)
> 

I've pulled the above example out of the original posting as a good
example of what Jon wants to do.  If the 'others' are user-definable,
then why not let the users define them?  Why should they be part of the
Unix filesystem?

Don't mess with my Unix to support archaic languages/formats!!
-- 

The MAD Programmer -- 919-228-3313 (Cornet 291)
alias: Curtis Jackson	...![ ihnp4 ulysses cbosgd mgnetp ]!burl!rcj
			...![ ihnp4 cbosgd akgua masscomp ]!clyde!rcj

john@genrad.UUCP (John P. Nelson) (07/26/85)

>> Some of us at Digital think we have found a basic problem with the  UNIX
>> file  system  for FORTRAN.  The problem is that there is no place to put
>> various kinds of information about  the  contents  of  the  file.   More
>> specifically:

>                                         There is really no reasonable
>way to put this into the filesysem itself without a lot of re-writing,
>and I doubt many people think it is worth the trouble. The fact is that
>fortran is a dying language, and it would be silly to make unix more
>friendly to fortran at the expense of more trouble for people who use
>modern languages.
>
>	Wayne

Well, this attitude is a bit extreme, but I really don't see why any of
this is necessary.  Why not have the fortran format file have a header
describing the data contained within, and have the header started by a
four byte magic number.  Magic numbers are used now to indicate that a
file is a binary executable, why not have a new magic number that describes
the file as a fortran file?

The argument that most (non-fortran) programs do not need the proposed
extra filesystem information applies to information stored in a header as
well.  This would put the extra burden of responsibility on the fortran
library, which would have to recognize ordinary files, and parse them
differently than from "funny" files.  This same extra step would have to
take place anyway, except that the information would come from the
filesystem, instead of from the file header.

What advantage is there to having this information be "out-of-band" (i.e. not
part of the file itself)?

John P. Nelson (decvax!genrad!john)

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (07/27/85)

One does not fix Fortran- or DEC- created problems with files
by trying to force UNIX to adopt the same mistakes.

If you have to maintain "attribute" information in a file,
how about storing it as the first yay many bytes of the file
contents, to avoid breaking commands like "cp" which are done
right on UNIX and wrong on DEC systems.  You're going to need
special-purpose utilities to decode this "attribute"
information anyhow, so please limit the damage to just those
utilities.

Suggested reading:  The Bell System Technical Journal, Vol. 57,
No. 6, Part 2 (July/August 1978), pp. 1947-1969, "UNIX Time-
Sharing System:  A Retrospective" by D. M. Ritchie.

avolio@decuac.UUCP (Frederick M. Avolio) (07/27/85)

In article <3287@decwrl.UUCP>, jcampbell@mrfort.DEC (Jon Campbell) writes:
>  
> Some of us at Digital think we have found a basic problem with the  UNIX
> file  system  for FORTRAN.  The problem is that there is no place to put
> various kinds of information about  the  contents  of  the  file.   More
> specifically:
>  

Lots of us at Digital think the UNIX file system is just fine the way it
is... (FORTRAN??)

---
Fred @ DEC -- ULTRIX Applications Center

grogers@uiucdcsb.Uiuc.ARPA (07/27/85)

Wayne you completely missed the point.  There should be some "standard"
way of attaching attributes to a file thus freeing the user from remembering
this information.  The poster of the base note used fortran as an example.
The file could have contained processed ditroff, unix plot commands, image data
or whatever.  
The current method of using commonly accepted suffixes to denote file
contents is probably good enough.  After all cc could refuse to compile
any file that doesn't end with .c, likewise for f77 and .f, and troff and .tr.

Greg Rogers
University of Illinois at Urbana-Champaign
Department of Computer Science and Demos

grogers@uiucdcs

phil@amdcad.UUCP (Phil Ngai) (07/27/85)

In article <578@decuac.UUCP> avolio@decuac.UUCP (Frederick M. Avolio) writes:
>In article <3287@decwrl.UUCP>, jcampbell@mrfort.DEC (Jon Campbell) writes:
>>  
>> Some of us at Digital think we have found a basic problem with the  UNIX
>> file  system  for FORTRAN.
>
>Lots of us at Digital think the UNIX file system is just fine the way it
>is... (FORTRAN??)

I'm glad to hear that some at Digital are against this horrible idea to
"adalize" the Unix filesystem but I would also say I know a number of sites
willing to pay good money for a good FORTRAN for Unix. So, Jon, keep working
on it, just don't try to impose VMS ideas on Unix. Try to do things within
the Unix philosophy. If you are not sure what that is, there seem to be
many people, within DEC, even, like Fred, who will probably help you.

-- 
 There are two kinds of people, those who lump people in groups and
 those who don't.

 Phil Ngai (408) 749-5720
 UUCP: {ucbvax,decwrl,ihnp4,allegra}!amdcad!phil
 ARPA: amdcad!phil@decwrl.ARPA

hedrick@topaz.ARPA (Chuck Hedrick) (07/28/85)

Jon: I am very glad to see that DEC is interested in Fortran on Unix.
You would make many people very happy if you bring to Unix a Fortran
compiler of the quality of the DEC VMS (or TOPS-20) compiler.
However...

I think it is a bad idea to add attributes to the Unix file system.
You indicate that it would not cause any incompatibility.  There is a
sense in which this is true.  But you would have to change all the
utility programs that copy files, to copy the attributes.  You would
have to change the formats of backup tapes and tapes such as tar, to
include the attributes.  To the extent that the attributes are used,
you would have to modify language runtime systems and utilities to
take attributes into account when reading files that have them.  One
of my staff members has just written a network spooler for VMS.  It is
amazing how complex it is to read VMS files in their full generality,
at least from Modula 2.  (Perhaps this is a defect in the runtime
system.)  This complexity has nothing to do with whether there is an
extra layer of RMS between you and the file system.  Indeed that layer
may make things more liveable.  It has to do simply with the
complexity of the file system.  I am recommending that our Computer
Science Dept use Unix, partly because I want an O.S. that is simple.
I would like our students to be able to do some system programming.  I
would not like to face them with the complexities of an RMS file.  If
you add attributes to your Unix, I would regretfully have to rule it
out as a candidate for our department.

However the problem that you pose still remains.  I think you want to
distinguish between 2 kinds of files: those that are intended to be
human-readable, and binary files.  I believe you should do whatever
violence is necessary to keep human-readable files in a single, simple
format.  This is the clear difference between Unix/Tenex on the one
side and IBM/VMS on the other.  I believe Unix people have chosen
which side of the fence they want to be on, and you should respect
that decision.  Fortunately, I believe you do not have to do much
violence to Fortran to make this work.  The only structure you really
have to worry about in human-readable files is carriage control.  I
suggest that the runtime system should turn the carriage control into
carriage return, line feed, form feed, etc.  At first glance, this
appears to be a  problem.  After all, you say, Fortran programs might
write a file using carriage control, and expect that when the file is
read back in, the carriage control is still there.  However as I
understand it, Fortran 77 has deemphasized carriage control.  I
believe it is now used only in "print" files.  It seems reasonable to
believe that a print file is not normally going to be read back in as
data to another Fortran program.  Thus I believe you should do the
following:
   - by default, map carriage control into CR, LF, etc. when output
	is to a "print" file.  I suggest a convention that by default
	units 0 (stderr) and 6 (stdout) are print files.
   - supply an option to OPEN to override this.  
   - for programs that do not use these mechanisms properly (e.g. old
	Fortran 66 programs), the only damage is that the ANSI
	carriage control characters will show up in column 1.  There 
	can still be a filter to handle this explicitly for those
	exceptions.
I do not like the TOPS-20 idea of defaulting depending upon the actual
output device (/dev/tty and /dev/lpt being print, disk files
nonprint).  The program will not then know in advance whether the file
is a print file. That makes it unnecessarily hard to code.

For binary files, I like the idea of a "magic number" that specifies
"This is a structured binary file".  In case you are not familiar with
the concept of magic number, all relocatable and executable binaries
have a certain number in their first 32 bits.  There is no danger of
confusing these files with text files, since the magic numbers are
small integers.  Thus the first 2 or 3 bytes are always 0, which is
unlikely in a text file.  You then need a way to specify the
attributes.  Experience with network protocols and other things
suggests a text format for this.  If you use bits, you will always run
out of bits.  There are several reasonable formats.  My favorite (you
are going to laugh, I'm sure) is Lisp format: a parenthesized list
with attribute-value pairs, e.g. ((RECORD-SIZE 200) (FORMAT VBA)) This
is simple to parse using a higher-level language.  Xerox used it for
specifying file attributes in PUP FTP, and it is easier to handle than
the alternatives I have seen elsewhere.  A more "binary" format might
be pairs of null-terminated strings, ending with an extra null.  But I
think the Lisp format is better.  You would probably want a convention
that the actual data begins on the next 32-bit boundary after the end
of the attributes, since that might simplify processing for certain
situations.  (For paged files, such as B-trees, you would probably
want to skip to the next page boundary, but that would be an action
implied by certain attributes.)

PS: in future messages, could you give a UUCP route?  I don't have
a routing to mrfort.DEC offhand.


Charles Hedrick
Rutgers University

uucp:   ...{harvard, seismo, ut-sally, sri-iu, ihnp4!packard}!topaz!hedrick
arpa:   HEDRICK@RUTGERS

ignatz@aicchi.UUCP (Ihnat) (07/28/85)

Jon Campbell of Digital Equipment Corp. recently posted a problem
statement/proposal concerning the Unix filesystem, particularly addressing
the problems encountered by such utilities as FORTRAN and ANSI tape label
requirements.  His conclusion was that there needs to be an extension to
the Unix filesystem scheme, allowing such information to be optionally
available to needful users.

The problems he quotes are, indeed, real; as real as the need for good
database support in Unix.  The issue I wish everyone to consider is that
*any* specialized support of this type must never go in the kernel!

I well remember struggling through the incredible source listings of the
Honeywell Level 6 Gcos operating system.  They, too, started out to support
what appeared to be a reasonable subset of accepted typed files--ISAM,
KIDA, etc.  In the end, the operating system grew to the point that a listing
set stood 3 feet high; much of that, support for various file types.  

Worse, even if an installation didn't ever intend to use these capabilities,
machine resources were dedicated in the 'kernel' to allow the filesystem
to recognize and process them, regardless.

One of the big plusses of Unix was moving items that weren't required out
of the kernel.  If it isn't involved with managing shared and/or critical
resources, it doesn't belong in the kernel.  Database file management should
be the realm of a separate, although standard, package--and recognition
and support of specialized file formats such as those expected by FORTRAN
or programs that must read/write ANSI tapes.  Consider also that, whether
optionally used or not, such extensions must be validated by 'fsck' and
its ilk, thus complicating system maintenance and improving the liklihood
of filesystem corruption.  

I most emphatically agree that some standardized means of providing such
information would be desirable;  possibly a set of library routines and
maintenance programs.  But don't clutter the kernel with this; we made
significant strides in isolating functionality and improving modularity
when the Unix 'toolchest' approach gained favor; let's PLEASE not backslide!


-- 
	Dave Ihnat
	Analysts International Corporation
	(312) 882-4673
	ihnp4!aicchi!ignatz

tim@cithep.UucP (Tim Smith ) (07/29/85)

This is not really relevent, but I have sometimes thought that instead of
offsets in a file starting at zero, they should start at some negative
number, possibly specified in the inode.  When you open the file you start
at zero.  The only way to get the data before zero would be an explicit seek.

This "negative region" could be used for things like the a.out header for
executable files, the #!/bin/sh for shell scripts ( note that there is no
need for the prog to recognize # as a comment character ), or information
on record sizes for files that were brought from another system or produced
by a record oriented language ( although it would still be up to user mode
code to actually interpret this; let's leave the kernel out of this. ).
-- 
					Tim Smith
				ihnp4!{wlbr!callan,cithep}!tim

tim@cithep.UucP (Tim Smith ) (07/29/85)

Re: the posting I just posted on this topic.

Of course, there are some problems with this also....
-- 
					Tim Smith
				ihnp4!{wlbr!callan,cithep}!tim

broman@noscvax.UUCP (Vincent P. Broman) (07/30/85)

jcampbell wrote:
>                               From: Jon Campbell
>                                     Digital Equipment Corp.
>                                     Marlboro, MA
>                                     617-467-6876
>                                     DECnode:MRFORT::JCAMPBELL
>			              decwrl!dec-rhea!dec-mrfort!jcampbell
>
> Some of us at Digital think we have found a basic problem with the  UNIX
> file  system  for FORTRAN.  The problem is that there is no place to put
> various kinds of information about  the  contents  of  the  file.   More
> specifically: ... [recordsizes, formcontrol, formats, char set, etc]

The obvious solution is the one used by the loader -- put that description
in a header in the file.  It might be advisable to make the header printable
(even legible).  Programs in the fortran system need just open the
file and read the first n bytes to get the full scoop on how the file is
organized. The new "LPR" command needed by fortran users is merely a one-line
shell script piping your carriage control filter to lpr. etc, etc.

Let's keep all that mumbo-jumbo out of the operating system!

UUCP: ucbvax!sdcsvax!noscvax!broman	Vincent Broman
ARPA: broman@nosc			Naval Ocean Systems Center, code 632
Phone: (619) 225-2365			San Diego, CA 92152

friesen@psivax.UUCP (Stanley Friesen) (07/30/85)

In article <4053@alice.UUCP> ark@alice.UUCP (Andrew Koenig) writes:
>> Some of us at Digital think we have found a basic problem with the  UNIX
>> file  system  for FORTRAN.  The problem is that there is no place to put
>> various kinds of information about  the  contents  of  the  file.
>
>The place to put information about the contents of the file
>is in the file itself.
>
	Absolutely! In fact the obvious solution is to place a header
at the beginning of the file containing such information as record
length, and write the Fortran I/O library so that it understands this
header. Something similar has *already* been incorporated into UNIX,
namely the executable(a.out) file format, which contains a header
specifying type(the "magic number") and the size of each portion
of the executable image. All that need be done is devise a variant
of this system for fixed-length record files!
-- 

				Sarima (Stanley Friesen)

{trwrb|allegra|cbosgd|hplabs|ihnp4|aero!uscvax!akgua}!sdcrdcf!psivax!friesen
or {ttdica|quad1|bellcore|scgvaxd}!psivax!friesen

z@rocksvax.UUCP (07/31/85)

It looks like people have been hitting the nail on the head w.r.t. where to
put file attributes.  File attribute belong in the file.  Now for a more
practical question.  What format shall these attributes have?  I would suggest
a binary magic number followed by the list-like attributes.  This is mainly
because I am hacking the SU-PUP file server to handle those "extra" attributes
from a VMS PUP FTP user.  Those interested in more details can tune into
PUP-LOVERS.

--
//Z\\
James M. Ziobro
Ziobro.Henr@Xerox.COM
{rochester,amd,sunybcs,allegra}!rocksvax!z

peter@kitty.UUCP (Peter DaSilva) (08/02/85)

> It looks like people have been hitting the nail on the head w.r.t. where to
> put file attributes.  File attribute belong in the file.

I once did a hack which put the file attributes on a file of the same name in
/usr/attr.

> What format shall these attributes have?  I would suggest
> a binary magic number followed by the list-like attributes.

I would suggest an environment like structure, since there is already a lot
of code to deal with this. That is "NAME=value\n...NAME=value\n\n". It should
also have some sort of size restriction (<1K?)

jqj@cornell.UUCP (J Q Johnson) (08/04/85)

File attributes clearly don't fit well into the Unix philosophy (what are
the attributes of a pipe?), so this discussion should probably shift to
some other news group.  However, a couple of points:

Unix does have a very few file attributes, e.g. file owner, group,
modes, last access, number of links, etc.  A purist would argue that
none of this data should be maintained for a file.

If you're going to have attributes, it is desirable to make them user-
extensible, e.g. using the environment-var format suggested in a
previous posting (however, that suggestion makes binary data in
attributes a real pain!).  It is also desirable to allocate lots of
space to attributes.  The Xerox Star file servers have a file structure
which allows some ridiculous amount of attribute data (64K?).  As you
might guess, attributes in that OS are used to hold all sorts of
things, including file names!

Typical useful attributes:  (1) file type (text file, binary file,
directory, etc.), (2) file owner, (3) last read/write/create date, (4)
last writer, (5) EOF pointer (many OSes allow data on the last page
after the EOF pointer), (6) protection data, (7) version #, (8) backup
information such as the magtape id on which the file was last archived,
(9) documentation on the file (e.g. a change log), etc.

If I were implementing a file system with attributes, I'd store
attributes in a parallel data space to the "standard" file data space,
implemented by a parallel set of page tables to the standard ones but
associated with the same inode or directory entry (depending on where
the file system kept its page tables) -- thus, the amount of attribute
data would be expandable just as the size of the file was, and I'd be
able to share a lot of code.  But since I'd access attribute data with
a different set of system calls (aread, awrite, aseek would be adequate,
but a higher level set would be preferable) they wouldn't get in the
way of programs that wanted to treat the file as "just data".

Of course, if you're serious about generalized attributes, why limit
the attribute access function to simple names?  And why limit the
attribute values to simple strings (or static binary data)?  Some DBMS
systems record a change log containing every transaction affecting the
file; in such a system a reasonable extension of "attributes" would
be a time-valued attribute that returned statistics on the state of the
file at that time.

All in all, though, I guess generalized attributes aren't all that good
an idea.

jack@boring.UUCP (08/05/85)

In article <95@cithep.UucP> tim@cithep.UucP (Tim Smith ) writes:
>This is not really relevent, but I have sometimes thought that instead of
>offsets in a file starting at zero, they should start at some negative
>number, possibly specified in the inode.  When you open the file you start
>at zero.  The only way to get the data before zero would be an explicit seek.
>
>This "negative region" could be used for things like the a.out header for
>executable files, the #!/bin/sh for shell scripts ( note that there is no
>need for the prog to recognize # as a comment character ), or information
>on record sizes for files that were brought from another system or produced
>by a record oriented language ( although it would still be up to user mode
>code to actually interpret this; let's leave the kernel out of this. ).
>-- 
>					Tim Smith
>				ihnp4!{wlbr!callan,cithep}!tim


This looks better than a special 'file attribute', since you don't
need funny system calls, etc, but it still has the problem
that you have to re-write almost any unix-utility in existence.

If I do 'cp a.out foobar', I would prefer the header to be copied
too........
-- 
	Jack Jansen, jack@mcvax.UUCP
	The shell is my oyster.

lasse@daab.UUCP (Lars Hammarstrand) (08/06/85)

In article <3287@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes:
>
> 
> 
> 
>                                From: Jon Campbell
>                                      Digital Equipment Corp.
>                                      Marlboro, MA
>                                      617-467-6876
>                                      DECnode:MRFORT::JCAMPBELL
> 
>To:  UNIX developers and users
> 
>Subject:  problems with the UNIX file system
> 
>Some of us at Digital think we have found a basic problem with the  UNIX
>file  system  for FORTRAN.  The problem is that there is no place to put
>various kinds of information about  the  contents  of  the  file.   More
>specifically:  ....................
-------------------------------------------------------------------------

Do you realy believe all that rubbish yourself?

I mean, do you real think that you can get a better world beacuse you are
restricted to file heads, fixed record sizes, lead-in chars, etc, etc ...
that other people thinks is best for you??   (It smells IBM)
What exactly are you looking for???
NO.....

	LIFE --> UNIX ------->  F R E E D O M


PS..
	I don't say your thing is bad, but it can be better!


	My name: Lasse Hammarstrand.
	My company: Datorisering AB,  SWEDEN.

	UUCP:	{seismo,decvax,philabs}!mcvax,ukc,unido!enea!daab!lasse
	ARPA:	decvax!mcvax!enea!daab!lasse@berkley.arpa

cudcv@daisy.warwick.UUCP (Rob McMahon) (08/07/85)

>> basic problem with the  UNIX file  system  for FORTRAN.
>
>You must also rewrite "cat," because I can copy a file by saying cat a >b .

It's worse than that - the shell creates `b', and it's got no idea what
sort of file to create, maybe the shell should be changed to include a syntax
like `>[ASCII,RECORDSIZE=80,ANSICC]b' !

guy@sun.uucp (Guy Harris) (08/08/85)

> The Xerox Star file servers have a file structure which allows some
> ridiculous amount of attribute data (64K?).  As you might guess, attributes
> in that OS are used to hold all sorts of things, including file names!

The nice thing about the Pilot OS that the Star application software ran
under is that the lowest level of the file system gave you the ability to
refer to a file by its unique ID.  It had no notion of file names
whatsoever, and only supported four file attributes:

	Pilot recognizes only four attributes: size, type, permanence, and
	immutability.

(From Redell, D. D.; Dalal, Y. K.; Horsley, T. R; Lauer, H. C.; Lynch, W.
C.; McJones, P. R.; Murray, H. G.; Purcell, S. C., "Pilot: An Operating
System for a Personal Computer", CACM 23(2): 81-92; Feb. 1980 - a good paper
to read if you're interested in operating systems.  Also read the
retrospective "well, here's what we did right and here's what we did wrong"
paper, Lauer, H. C. "Observations on the Development of an Operating
System", Proceedings of the 8th Symposium on Operating Systems, in ACM
Operating Systems Review 15(5): 30-36; Dec. 1981.)

Anything more was up to the next level up.  I'm loath to say "the
application" - Pilot runs on machines with no "protected mode", limited
memory protection (write-protected code pages only), and one address space
for all processes, so it's not entirely clear what's "operating system" and
what's "application".

One advantage of storing a file name with the file "attributes" is that if
the file gets lost the file system salvager can give it a better name that
its unique ID in BCD when reconnecting it to a directory.  (Files-11,
although its basic structure is essentially the same as UNIX - i-list,
called "index file", and directories which merely map names into indices -
stores the file name in the "inode", and can use this when restoring the
file into "lost+found", called [2,3] or something equally mnemonic.  Of
course, if two files with the same name are lost, so are you...)  I only
used a Star once; if it's directory structure is anything like what I infer
the Lisa's is, it's not a problem.  My model for both systems is that you
have "folders" which are directories, but since files are opened by unique
ID (my guess for the Lisa; that's how I would have done it, anyway) they
don't have to have unique names within a directory.  The names are merely
for human convenience.  You can open a file by pointing at it and hitting
the appropriate keyboard/mouse button(s); the system knows the unique ID of
the file whose icon the mouse pointer is in, so again it doesn't need the
name.

	Guy Harris