[net.unix] more on file \"attributes\"

jcampbell@mrfort.DEC (Jon Campbell) (08/01/85)

Well, my mailbox runneth over with mail telling me how I've struck
at the heart of UNIX by suggesting file attributes. I think perhaps
I have presented the problem (and its possible solution) in the wrong
light.
 
What many users have suggested is that I put a "file header" at the
beginning of each file. This seems like a reasonable approach, except
that existing FORTRANs do not put such cruft at the beginning of files
now. So we have a skew problem. What I was suggesting, though it might
have not been clear, is an "invisible" file header, one which you look
at in a slightly different way than the real data (the bytes in the file).
Perhaps this could be by using a negative byte address in the file, perhaps
some other way. I'm not particularly interested in the way it might be done,
except that it cannot be part of the actual data and it cannot be a separate
file.
 
There are many such operating systems (which have file information in
invisible or hidden headers) around, such as the ATEX text-processing
system used in many newspapers. Ordinary programs and utilities need
not ever look at the invisible header if they are interested in the
data only.
 
I suggested that it be part of the "file information block" (i.e.,
the filename, creation date, and size) because that is a convenient
way to have it copied transparently when you make a copy of the file
or rename the file.
 
I am not suggesting changing the way that the vast majority of UNIX
utilities and user programs currently look at files, nor suggesting
any changes to them. I am suggesting that we give a data-handle, if
you will, for those programs and utilities which care to use the
"attributes". There is no loss of performance, no restrictions placed
on file usage, and very small extra disk space used.
 
I think that you folks who are having a look at creating UNIX utilities
which can do serious data manipulation, read magtapes from "foreign"
operating systems and munge it (without having to read the ANSI
magtape header files by hand), or write utilities which can look at
different files without knowing a priori the file format, will
recognize the problem that I am trying to address. I am not trying
to "strike at the heart" of UNIX; I am letting you know that there
is a problem to be solved that cannot be solved easily.
 
Thanks for all of your feedback. I am looking forward for more.
 
					Thanks,
					Jon Campbell
   --------

--------------------

Please note that this mail message is likely to be incomplete.
The sender aborted the transmission.

	rhea::MAILER-DAEMON

--------------------

guy@sun.uucp (Guy Harris) (08/02/85)

> There are many such operating systems (which have file information in
> invisible or hidden headers) around, such as the ATEX text-processing
> system used in many newspapers. Ordinary programs and utilities need
> not ever look at the invisible header if they are interested in the
> data only.
>  
> I suggested that it be part of the "file information block" (i.e.,
> the filename, creation date, and size) because that is a convenient
> way to have it copied transparently when you make a copy of the file
> or rename the file.

1) There is no such "file information block", strictly speaking, on UNIX.
The file name is not stored anywhere with the file (the only place the name
resides is in a directory, unlike in Files-11, where one copy resides in the
"file header" and other copies in the directories that reference the file)
and the creation date/time isn't stored anywhere.

2) How does putting it in the "file information block" (whatever that may be
in UNIX - the inode, I presume) make it "copied transparently when you make
a copy of the file"?  The only way I can interpret "transparently" is that
any software that now copies files will automagically copy the header
information without any change to that software.  This is not the case.  If
the header isn't in the data of the file, it can't be set by doing a "write"
to the file; all the UNIX copy command ("cp") does is:

	open the "from" file for reading
	open the "to" file for writing, truncating it if it
		exists and creating it if it doesn't
	while (data remains to be read from the "from" file) {
		read data from the "from" file
		write it to the "to" file
	}

Nowhere in here is there anything which could set the file type, record
length, etc., etc., etc..  You'd have to hang it off the "open for writing"
operation (that's where the file permission modes are set now).  That would
require "cp" to change, and would require lots of other programs to change
as well.  Hardly transparent.

> I am not suggesting changing the way that the vast majority of UNIX
> utilities and user programs currently look at files, nor suggesting
> any changes to them. I am suggesting that we give a data-handle, if
> you will, for those programs and utilities which care to use the
> "attributes". There is no loss of performance, no restrictions placed
> on file usage, and very small extra disk space used.

Wrong.  If, for example, you write files with "FORTRAN carriage control"
differenly from UNIX text files (with embedded ASCII control characters for
carriage control), current UNIX utilities will not be able to read those
files, *unless you change them* - which you say you are not suggesting.

> I think that you folks who are having a look at creating UNIX utilities
> which can do serious data manipulation, 

Plenty of UNIX utilities can do that already.

> read magtapes from "foreign" operating systems and munge it (without
> having to read the ANSI magtape header files by hand),

Such programs exist for UNIX - yes, they have to read the magtape header by
hand, but so what?  Unless you modified "grep", you couldn't do

	grep mumble /dev/mt0/frobozz.c

(or however you'd have "grep" read file "frobozz.c" on a magtape) without
changing "grep" to understand the ANSI record format.  Even if you did
modify "grep" (and the operating system, so that you could treat a magtape
as a file-structured device using ANSI labels), you probably wouldn't want
to.  You'd probably want to extract the file first - using a program to
extract files from a tape; that program would be the only program on the
whole system which had to know anything about ANSI labels, etc.

> or write utilities which can look at different files without knowing a
> priori the file format,

Why would you want a utility that could work on text files and FORTRAN
binary files?  What operations on such files (other than copy, move, etc.)
would be common to both kinds of files?  You hardly want to print a FORTRAN
binary file the same way you print a text file, or scan through a FORTRAN
binary file with "grep", or...  Nor would you want to be able to feed a text
file to a FORTRAN program that expects binary files (I doubt you can do that
with VMS or any other operating system, either).

Most of the UNIX programs that use "simple" access methods (i.e., reading
byte streams) have no interest in reading anything but text files.  The
other programs read structured files through a user-mode I/O package; that
package would have no problem reading a file header placed at the beginning
of the file.  "cp", since it copies bytes, not records, would copy those
structured files or any other collection of bytes you want to put into a
files; the same holds true for "tar", "cpio", etc..  No program which
expects to read text files would be likely to want to read a structured file
like that.

As for FORTRAN vs. ASCII carriage control, seems to me I remember a DEC
operating system called RT-11 which used ASCII carriage control for all its
text files, and it seemed to support FORTRAN...

In short, lots of us who *are* familiar with FORTRAN files and ANSI tapes do
*not* recognize UNIX as having any of the problems you're talking about -
but all this has been said before; you've provided no new arguments in favor
of adding attributes like that to UNIX files.

	Guy Harris

jss@sjuvax.UUCP (J. Shapiro) (08/05/85)

Mr. Campbell has one point, at least, which should not be ignored. UNIX is
badly in need of some sort of semaphore structure for use between
processes which do not know about each other.  Without this facility,
it would be very difficult to write a library which could provide
reliable record locking, which is one of the facilities he needed
which is sorely lacking in current UNIX.  It seems to me, after admittedly
very little thought, that one of two things is needed:

	1) some sempahore facility which would have a namespace which
		would allow owner/group/world read/write priviledges.
		This would actually be generally useful, and current file
		primatives do not provide this facility in a resource
		efficient or reliable manner.

	2) a block level lock on a file, either physical block or logical
		block, preferably physical.

The first facility, I believe, is to be greatly preferred.  Please,
arguments about using pseudo devices or files in the file system or
pipes/sockets/wombats-carrying-postcards don't wash.  These are
neither resource efficient nor portable.  The semaphore facilities
necessary are not hard to implement (I have done them myself on other
systems), and would help a great deal in solving many problems of
record access, which contrary to popular opinion in UNIX land
constitutes a great deal of what is done out in the real business world.

To my knowledge, all of the database systems providing for reliable
record access do this by circumnavigating UNIX, which seems to me to
be a bit of a waste.

I enjoy using UNIX as a development environment, and I believe that
99% of its ideas are in theory right, but it has a few shortcomings.
Others have noticed the process synchronization shortcomings.  Has
anyone done anything about them?

Jon Shapiro
Haverford College

mjs@eagle.UUCP (M.J.Shannon) (08/05/85)

> Mr. Campbell has one point, at least, which should not be ignored. UNIX is
> badly in need of some sort of semaphore structure for use between
> processes which do not know about each other.  Without this facility,
> it would be very difficult to write a library which could provide
> reliable record locking, which is one of the facilities he needed
> which is sorely lacking in current UNIX.  It seems to me, after admittedly
> very little thought, that one of two things is needed:
> 
> 	1) some sempahore facility which would have a namespace which
> 		would allow owner/group/world read/write priviledges.
> 		This would actually be generally useful, and current file
> 		primatives do not provide this facility in a resource
> 		efficient or reliable manner.

System V has just such a semaphore facility.  It also has shared memory and
messages to allow processes to bind themselves to each other and cooperate
even more closely.

> 	2) a block level lock on a file, either physical block or logical
> 		block, preferably physical.

System Vr2 (I'm almost certain) has advisory file locking.  I don't have the
documentation handy, but it may allow the user to specify file addresses to be
locked.  While this is not mandatory locking (i.e., no processes will block on
reads or writes due to a lock), cooperating processes can prevent themselves
from stepping on each other's data with these locks.

> The first facility, I believe, is to be greatly preferred.  Please,
> arguments about using pseudo devices or files in the file system or
> pipes/sockets/wombats-carrying-postcards don't wash.  These are
> neither resource efficient nor portable.  The semaphore facilities
> necessary are not hard to implement (I have done them myself on other
> systems), and would help a great deal in solving many problems of
> record access, which contrary to popular opinion in UNIX land
> constitutes a great deal of what is done out in the real business world.
> 
> To my knowledge, all of the database systems providing for reliable
> record access do this by circumnavigating UNIX, which seems to me to
> be a bit of a waste.
> 
> I enjoy using UNIX as a development environment, and I believe that
> 99% of its ideas are in theory right, but it has a few shortcomings.
> Others have noticed the process synchronization shortcomings.  Has
> anyone done anything about them?
> 
> Jon Shapiro
> Haverford College

Flame on (medium-well):

What?  That famous university-developed system doesn't support any IPC?  No
locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!

AT&T: The Right Choice; System V: The Right UNIX* System.

* - UNIX is a trademark of AT&T.  It is *not* a trademark of the Regents of
	California.

Flame reduced to pilot light.
-- 
	Marty Shannon
UUCP:	ihnp4!eagle!mjs
Phone:	+1 201 522 6063

Warped people are throwbacks from the days of the United Federation of Planets.

wcs@ho95e.UUCP (x0705) (08/05/85)

Jon Shapiro (> >) asked about semaphores and record locking on UNIX.
Marty Shannon (>) pointed out that System V has semaphores and shared memory,
and that:
> System Vr2 (I'm almost certain) has advisory file locking.  I don't have the
> documentation handy, but it may allow the user to specify file addresses to be
> locked.  While this is not mandatory locking (i.e., no processes will block on
> reads or writes due to a lock), cooperating processes can prevent themselves
> from stepping on each other's data with these locks.

Actually, it's the paging release (System V Rel 2, Vax Ver 2, 3B20 Ver 4,
3B2 Ver ???).  I think mandatory locking is supposed to by in Sys V Rel 3.

> 
> > The first facility [semaphores], I believe, is to be greatly preferred.  Please,
> > arguments about using pseudo devices or files in the file system or
> > pipes/sockets/wombats-carrying-postcards don't wash.  These are
> > neither resource efficient nor portable.  The semaphore facilities
> > necessary are not hard to implement (I have done them myself on other
> > systems), and would help a great deal in solving many problems of
> > record access, which contrary to popular opinion in UNIX land
> > constitutes a great deal of what is done out in the real business world.

The SysVR2v* functions were developed to phase in support for the /usr/group
standard, which was put together by UNIX users out in the "real world".

(Actually, wombats carrying postcards can be quite efficient in a distributed
environment.  They're somewhat more portable than TCP/IP, and continue working
if the power goes down.)

(Trademarks and owners  include: { UNIX, Vax, DEC, 3B**, AT&T-**, Wombat Inc.} )
-- 
## Bill Stewart, AT&T Bell Labs, Holmdel NJ 1-201-949-0705 ihnp4!ho95c!wcs

preece@ccvaxa.UUCP (08/06/85)

> 1) There is no such "file information block", strictly speaking, on
> UNIX.
----------
There's a lot of stuff in the inode that looks an awful lot like a file
information block. [It would be cute if there were more room in the
directory entry -- then we could have separate attribute lists for
each link to a file...]
----------
> 2) How does putting it in the "file information block" (whatever that
> may be in UNIX - the inode, I presume) make it "copied transparently
> when you make a copy of the file"?
----------
Obviously, it doesn't.  On the other hand, except for dump/restore, it
would be sufficient to have open(2) create an empty one.  Tools to
manipulate it could be handled separately (after you cp the file
you cp_file_properties to get the attribute list), though it
wouldn't be a big deal to make cp and mv handle them, too, and that
is what a vendor who was going to do this would do.
----------
> If, for example, you write files with "FORTRAN carriage control"
> differenly from UNIX text files (with embedded ASCII control characters
> for carriage control), current UNIX utilities will not be able to read
> those files, *unless you change them* - which you say you are not
> suggesting.
----------
Why not?  The utilities may not deal with them intelligently or in the
way intended, but the files themselves would still be just streams of
data bytes, which they would NOT be if you put the header in the file.
Grep, for instance, would find nothing confusing in a file with a
hidden property list, but would find a confusing header if the header
were embedded in the file (confusing in the sense of "containing
stuff other than data" -- it would still be able to process either
kind of file).
----------
> > or write utilities which can look at different files without knowing a
> > priori the file format,

> Why would you want a utility that could work on text files and FORTRAN
> binary files?
----------
You're mis-interpreting "format."  A utility might need to deal with
132 byte records and 80 byte records or with files having carriage
control and files not having carriage control.  There are also, of
course, a lot of useful things you could put in a property list for
use by maintenance programs (such as, perhaps, a more neatly
integrated version control system).
----------
> No program which expects to read text files would be likely to want to
> read a structured file like that.
----------
There are structured files which still have perfectly normal
Unix text file characteristics (a file, for instance, of 81-byte
records containing a newline in byte 80 of each record).

The embedded header approach makes a number of things more difficult,
including random access (offsets have to account for the header) and
use with normal Unix utilities (a filter would be needed before using
, for instance, grep; multi-file commands (such as merge) would need
to have temporaries prepared, since you couldn't provide filtering
on more than one file through a pipe).
----------
> but all this has been said before; you've provided no new arguments in
> favor of adding attributes like that to UNIX files.
----------
I'm not holding my breath.  I think they would be useful and would help
sell into a few new markets, but I don't think we can't live without
them.

-- 
scott preece
gould/csd - urbana
ihnp4!uiucdcs!ccvaxa!preece

sean@ukma.UUCP (Sean Casey) (08/06/85)

In article <1238@sjuvax.UUCP> jss@sjuvax.UUCP (J. Shapiro) writes:
>Mr. Campbell has one point, at least, which should not be ignored. UNIX is
>badly in need of some sort of semaphore structure for use between
>processes which do not know about each other.  Without this facility,
>etc...
>To my knowledge, all of the database systems providing for reliable
>record access do this by circumnavigating UNIX, which seems to me to
>be a bit of a waste.

For real.  The ingres we run here has a daemon running all the time just so
it can be guaranteed atomic locks!  Kludge city.


-- 

-  Sean Casey				UUCP:	sean@ukma.UUCP   or
-  Department of Mathematics			{cbosgd,anlams,hasmed}!ukma!sean
-  University of Kentucky		ARPA:	ukma!sean@ANL-MCS.ARPA	

sean@ukma.UUCP (Sean Casey) (08/06/85)

In article <1311@eagle.UUCP> mjs@eagle.UUCP (M.J.Shannon) writes:
>What?  That famous university-developed system doesn't support any IPC?  No
>locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!

What?  That big corporation-developed system doesn't have TCP/IP?  No
sockets?  No symbolic links?  No cp -r?  No C-shell?  Geee....  No!


-- 

-  Sean Casey				UUCP:	sean@ukma.UUCP   or
-  Department of Mathematics			{cbosgd,anlams,hasmed}!ukma!sean
-  University of Kentucky		ARPA:	ukma!sean@ANL-MCS.ARPA	

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (08/07/85)

> I enjoy using UNIX as a development environment, and I believe that
> 99% of its ideas are in theory right, but it has a few shortcomings.
> Others have noticed the process synchronization shortcomings.  Has
> anyone done anything about them?

Yes, AT&T has semaphores, message queues, and record locking.  Please
don't implement a similar yet different facility; we have too many
of those already.

guy@sun.uucp (Guy Harris) (08/08/85)

> What?  That famous university-developed system doesn't support any IPC?  No
> locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!

No shared memory, no semaphores - (currently) true.

No IPC, no messages - go forth and read:

	SOCKET(2)
	BIND(2)
	LISTEN(2)
	ACCEPT(2)
	CONNECT(2)
	SEND(2)
	RECV(2)

in your 4.2BSD manuals, then try again.  (Unlike those in a certain famous
USDL-developed system, they work over a network.  Wait a minnit, somebody's
saying they don't work over a network because nothing other than file
transfer and remote command execution works over a network in that system.
Gee.... No!)

No locks - no record locks, but for file locks, go forth and read:

	FLOCK(2)

and try again.

Moral: flaming about the general unworthiness of some UNIX system other than
your favorite generally causes singed eyebrows and little illumination.
(This goes for you Berkeleyphiles - and even Researchphiles - out there,
too.)

	Guy Harris

chris@gargoyle.UUCP (Chris Johnston) (08/09/85)

Someone write a utility to convert to and from the column one format
control and end this discussion.

lasse@daab.UUCP (Lars Hammarstrand) (08/14/85)

In article <2030@ukma.UUCP> sean@ukma.UUCP (Sean Casey) writes:
>In article <1311@eagle.UUCP> mjs@eagle.UUCP (M.J.Shannon) writes:
>>What?  That famous university-developed system doesn't support any IPC?  No
>>locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!
>
>What?  That big corporation-developed system doesn't have TCP/IP?  No
>sockets?  No symbolic links?  No cp -r?  No C-shell?  Geee....  No!
>
>
>-- 
>
>-  Sean Casey				UUCP:	sean@ukma.UUCP   or
>-  Department of Mathematics			{cbosgd,anlams,hasmed}!ukma!sean
>-  University of Kentucky		ARPA:	ukma!sean@ANL-MCS.ARPA	


Why don't you look for UniPlus+ port of SysV, there you have everything you need
and then you don't want to run anything else on your machine!.


	Lars Hammarstrand.
	Datorisering AB, Stockholm, SWEDEN.

	UUCP:	{seismo,decvax,philabs}!{mcvax,ukc,unido}!enea!daab!lasse
	ARPA:	decvax!mcvax!enea!daab!lasse@berkley.ARPA
		decvax!mcvax!enea!daab!lasse@seismo.ARPA

alexis@reed.UUCP (Alexis Dimitriadis) (08/14/85)

> There's a lot of stuff in the inode that looks an awful lot like a file
> information block. [It would be cute if there were more room in the
> directory entry -- then we could have separate attribute lists for
> each link to a file...]

  As someone pointed out, a small amount of file "attributes" is
currently being kept on the directory entry itself -- as a suffix to
the filename.  Other programs use specially named files, or system
calls for lock management, etc.  Biff uses the
owner-execute-premission bit of the terminal name as a flag. (ugh).

  I am a big fan of UNIX, but I have often felt that _some_ place to
keep user-defined, out-of-band information about a file should be
provided, say, a field in the inode explicitly set aside for
user-defined information.  Something could be worked out about who
should be able to modify it, and programs that care to could set their
output to have the "attributes" of their input (fstat could be made to
return the relevant info on pipes).

  The "attributes" could or could not be copied when copying a file.
cp has to explicitly set the permission mode of the new file, after all.
More serious would be namespace problems, when more than one family
of applications tries to use the field.

  Please do not flame, I have been carefully following the discussion
on the "file information block", and I have never supported it anyway.  :-)
If I am missing something too, I would like to know.
-- 
_______________________________________________
  As soon as I get a full time job, the opinions expressed above
will attach themselves to my employer, who will never be rid of
them again.

             alexis @ reed

	         ...teneron! \
...seismo!ihnp4! - tektronix! - reed.UUCP
     ...decvax! /

john@genrad.UUCP (John P. Nelson) (08/15/85)

>>>What?  That famous university-developed system doesn't support any IPC?  No
>>>locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!
>>
>>What?  That big corporation-developed system doesn't have TCP/IP?  No
>>sockets?  No symbolic links?  No cp -r?  No C-shell?  Geee....  No!
>>
>
>Why don't you look for UniPlus+ port of SysV, there you have everything you
>need and then you don't want to run anything else on your machine!.

Not really.  It doesn't support symbolic links.  Or select() using any
file descriptors other than a network socket.  Unix domain sockets are not
supported.  cp -r isn't there either.  Actually, the TCP/IP is an expensive
option - as normally distributed, all the functions like socket(), connect(),
etc. return an error (Unimplemented system call or something like that).
Oh, and some of the TCP/IP socket function interfaces are slightly different
than 4.2 (I can't recall the specifics right now)

Oh, it DOES have C-shell.  However, the kernal does NOT recognize "#!",
which limits the usefulness of csh scripts.

I would rather have berkeley 4.2, but we really need shared memory.

John Nelson (a UniPlus+ System V user)

peter@baylor.UUCP (Peter da Silva) (08/17/85)

> >>>What?  That famous university-developed system doesn't support any IPC?  No
> >>>locks?  No semaphores?  No shared memory?  No messages?  Gee....  No!
> >>
> >>What?  That big corporation-developed system doesn't have TCP/IP?  No
> >>sockets?  No symbolic links?  No cp -r?  No C-shell?  Geee....  No!
> >>
> >
> >Why don't you look for UniPlus+ port of SysV, there you have everything you
> >need and then you don't want to run anything else on your machine!.
> 
> Not really.  It doesn't support symbolic links.  Or select() using any

Good to see I'm not the only person willing to flame on this. At least I got
the message that SV-ists are just as refractory as creationists.

I still want job control and symbolic links, you all hear?
-- 
	Peter da Silva (the mad Australian werewolf)
		UUCP: ...!shell!neuro1!{hyd-ptd,baylor,datafac}!peter
		MCI: PDASILVA; CIS: 70216,1076

lasse@daab.UUCP (Lars Hammarstrand) (08/19/85)

>	.	.
>	.	.
>	.	.
>option - as normally distributed, all the functions like socket(), connect(),
>etc. return an error (Unimplemented system call or something like that).
>Oh, and some of the TCP/IP socket function interfaces are slightly different
>than 4.2 (I can't recall the specifics right now)

??ehy, On what machine are you running ????, I'm just going to Germany to
look at 2 machines running B-net programs on a Cromemco, rcp rstat .. you now!.

>Oh, it DOES have C-shell.  However, the kernal does NOT recognize "#!",
>which limits the usefulness of csh scripts.

True, but not the whole true, it starts up the 'sh' as son as it finds a
"#" in the beginning of a [*]shell script. And it does got a Tcsh too
(if you have got a source licence of the csh)

BTW: is it realy the kernel who recognize the "#!" sequence?,(just wondered!)

>
>I would rather have berkeley 4.2, but we really need shared memory.
>
>John Nelson (a UniPlus+ System V user)

Ok, I believe you, and I don't want to start a war between diffrent UNIX
systems, because in the bottom they are all *UNIX* systems, I just wanted
so say that System V is not so bad as many people told me 2 years ago.


	Lars Hammarstrand.
	Datorisering AB, Stockholm, SWEDEN.

	UUCP:	{seismo,decvax,philabs}!{mcvax,ukc,unido}!enea!daab!lasse
	ARPA:	decvax!mcvax!enea!daab!lasse@berkley.ARPA
		decvax!mcvax!enea!daab!lasse@seismo.ARPA

Ps: I'm only in it for the *UNIX* ! Ds