jcampbell@mrfort.DEC (Jon Campbell) (08/01/85)
Well, my mailbox runneth over with mail telling me how I've struck at the heart of UNIX by suggesting file attributes. I think perhaps I have presented the problem (and its possible solution) in the wrong light. What many users have suggested is that I put a "file header" at the beginning of each file. This seems like a reasonable approach, except that existing FORTRANs do not put such cruft at the beginning of files now. So we have a skew problem. What I was suggesting, though it might have not been clear, is an "invisible" file header, one which you look at in a slightly different way than the real data (the bytes in the file). Perhaps this could be by using a negative byte address in the file, perhaps some other way. I'm not particularly interested in the way it might be done, except that it cannot be part of the actual data and it cannot be a separate file. There are many such operating systems (which have file information in invisible or hidden headers) around, such as the ATEX text-processing system used in many newspapers. Ordinary programs and utilities need not ever look at the invisible header if they are interested in the data only. I suggested that it be part of the "file information block" (i.e., the filename, creation date, and size) because that is a convenient way to have it copied transparently when you make a copy of the file or rename the file. I am not suggesting changing the way that the vast majority of UNIX utilities and user programs currently look at files, nor suggesting any changes to them. I am suggesting that we give a data-handle, if you will, for those programs and utilities which care to use the "attributes". There is no loss of performance, no restrictions placed on file usage, and very small extra disk space used. I think that you folks who are having a look at creating UNIX utilities which can do serious data manipulation, read magtapes from "foreign" operating systems and munge it (without having to read the ANSI magtape header files by hand), or write utilities which can look at different files without knowing a priori the file format, will recognize the problem that I am trying to address. I am not trying to "strike at the heart" of UNIX; I am letting you know that there is a problem to be solved that cannot be solved easily. Thanks for all of your feedback. I am looking forward for more. Thanks, Jon Campbell --------
levy@ttrdc.UUCP (Daniel R. Levy) (08/03/85)
jcampbell@mrfort.DEC (Jon Campbell) <3398@decwrl.UUCP>: >What many users have suggested is that I put a "file header" at the >beginning of each file. This seems like a reasonable approach, except >that existing FORTRANs do not put such cruft at the beginning of files >now. So we have a skew problem. What I was suggesting, though it might >have not been clear, is an "invisible" file header, one which you look >at in a slightly different way than the real data (the bytes in the file). >Perhaps this could be by using a negative byte address in the file, perhaps >some other way. I'm not particularly interested in the way it might be done, >except that it cannot be part of the actual data and it cannot be a separate >file. >There are many such operating systems (which have file information in >invisible or hidden headers) around, such as the ATEX text-processing >I suggested that it be part of the "file information block" (i.e., >the filename, creation date, and size) because that is a convenient >way to have it copied transparently when you make a copy of the file >or rename the file. > >I am not suggesting changing the way that the vast majority of UNIX >utilities and user programs currently look at files, nor suggesting >any changes to them. I am suggesting that we give a data-handle, if >you will, for those programs and utilities which care to use the >"attributes". There is no loss of performance, no restrictions placed >on file usage, and very small extra disk space used. > Some comments. One of the beauties of Unix files (the ordinary kind) is that you can deal with them without having to worry about anything EXCEPT the "act- ual data" in them. ANY scheme which uses files which contain information which is out of the "actual data" is going to add new requirements to programs which manipulate files. This has been beaten to death on the net but is a very valid point. Special features not in the actual data would be lost the moment the file was manipulated by a "dumb" command, like "cat" or "vi" or "ed" or (name your favorite command). All "cp" does now is to read in the data of its input files, character by character, and spits it out to the output files. It would have to be more sophisticated, like VMS COPY, to know that it should open a file with special attributes. This would be slower running. cp does not now copy the time of creation of the first file to the second one, also, so your assertion above (that the file creation time is transferred from the original to the copied file) is mistaken. For that matter, the file size is not directly transferred either (it is determined by the data which is actually copied, and under ordinary circumstances it is no surprise that it is the same as for the original file). Skews against other FORTRANs? While I admit that a header would generate a skew against the present "f77", remember any attempt to transfer files with attributes from one operating system to another has got to deal up front with those attributes. What about, for instance, trying to transfer VMS files with special Fortran carriage control attributes to an IBM system? Even disregar- ding the ASCII-EBCDIC skew, the interface program is expected to know about the special quirks of files on both systems. The header in a Unix Fortran out- put file would just be treated as another such quirk when transferred to an- other operating system. It probably would be a good idea, however, to only add headers to output files which are intended to contain a special attribute, such as carriage control or binary files. And of course it would be desirable to determine what kind of file is being written into (it would be silly to put a header out to a terminal or a pipe, so detection of that kind of redirection on a logical unit would cause carriage control attributes to be translated into the standard ASCII control characters in that case). There seems to be no easy answer. I stand firm with those, however, who say don't tamper with the Unix way of storing files. At least 99.99% of all the programs running on Unix machines will be "c" programs, which don't need the overhead of having to account for the special "fortran" stuff whenever they process files. (Sorry, pascal and lisp programmers, I should lump you in with the "c" people.) Enough rambling for now. I doubt if I've said anything anyone else hasn't. -- ------------------------------- Disclaimer: The views contained herein are | dan levy | yvel nad | my own and are not at all those of my em- | an engihacker @ | ployer, my pets, my plants, my boss, or the | at&t computer systems division | s.a. of any computer upon which I may hack. | skokie, illinois | | "go for it" | Path: ..!ihnp4!ttrdc!levy -------------------------------- or: ..!ihnp4!iheds!ttbcad!levy
throopw@rtp47.UUCP (Wayne Throop) (08/04/85)
> [Using fortran-specific headers] seems like a reasonable approach, > except that existing FORTRANs do not put such cruft at the beginning of > files now. [...] What I was suggesting, though it might have not been > clear, is an "invisible" file header, one which you look at in a > slightly different way than the real data (the bytes in the file). Now let me get this straight. Putting formatting information in the file itself is a good idea, but the lack of FORTRAN systems that do this today causes you to suggest adding an independant data area to all files to keep formatting information. *Now* I see the light! All those existing FORTRANs that use Unix "invisible file headers"! How silly of me. (In case it isn't clear, I think adding an "invisible file header" to Unix is a Bad Thing.) -- Wayne Throop at Data General, RTP, NC <the-known-world>!mcnc!rti-sel!rtp47!throopw
sean@ukma.UUCP (Sean Casey) (08/05/85)
In article <3398@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes: >What many users have suggested is that I put a "file header" at the >beginning of each file. This seems like a reasonable approach, except >that existing FORTRANs do not put such cruft at the beginning of files >now. So we have a skew problem. What I was suggesting, though it might >have not been clear, is an "invisible" file header, one which you look >at in a slightly different way than the real data (the bytes in the file). Why not just rewrite part of the fortran I/O library, instead of rewriting the whole file system, the backup programs, etc.? Why would there be problems with that? Sounds to me like it would be easier, and save a lot of bucks in man-hours and bug fixing. Not only that, but if you did implement your "invisible" file header, you'd have rewrite part of the I/O library anyway. Geez, why go to all that trouble? -- - Sean Casey UUCP: sean@ukma.UUCP or - Department of Mathematics {cbosgd,anlams,hasmed}!ukma!sean - University of Kentucky ARPA: ukma!sean@ANL-MCS.ARPA
hedrick@topaz.ARPA (Chuck Hedrick) (08/05/85)
In article <3398@decwrl.UUCP> jcampbell@mrfort.DEC (Jon Campbell) writes: >Well, my mailbox runneth over with mail telling me how I've struck >at the heart of UNIX by suggesting file attributes. I think perhaps >I have presented the problem (and its possible solution) in the wrong >light. > >What many users have suggested is that I put a "file header" at the >beginning of each file. This seems like a reasonable approach, except >that existing FORTRANs do not put such cruft at the beginning of files >now. So we have a skew problem. I would much rather have a skew problem between f77 and your Fortran than between Unix and your new OS (whatever you choose to call it). Even if you hid your attributes somewhere, it is unlikely that your new Fortran would be compatible with the old. One assumes that new programs would no longer specify the file attributes in OPEN. (I mean, why bother adding all these attributes in the file if the user is still going to have to specify them in his program.) A program that failed to specify the attributes could no longer work when compiled under the old Fortran. Futhermore, it is likely that your new Fortran would support a wider variety of file and record organizations than the old one. So in fact the amount of compatibility that would be possible is minimal. For the cases where it is possible, one could have a utility that strips or adds the headers. You could also have an OPEN parameter that said not to create a header for new files, and your Fortran could be trained to read files that did not have a header, as longer as the user specified enough of the attributes explicitly in OPEN. Let's look at some of the costs of making the change that you propose: 1) You are going to have to hide the information either in the inode, or in some sort of negative file address. Changing the file system to allow this will make it incompatible with other Unix file systems. Probably you will want to put it in a negative file address, or a special block accessible only through some special system call Adding it to the inode would cause the size of the inode to increase, thus penalizing everyone (in disk space used) even if they don't use the feature. 2) You will have to change the dump, restore, and tar to save this information on tapes and restore it. I trust you want your attributes to go onto backup tapes (dump and restore) and for it to be possible to move this information with a file when you take the file to a different system (tar). A new switch will have to be added to tar to allow you to suppress this information, in order to avoid confusing systems that don't understand it. Tapes produced on your system would in general not be compatible with other systems unless this switch is used. 3) You will have to change cp, mv, and various other utilities. Users are accustomed to the fact that when they copy a file, the copy is the same as the original. This would no longer be the case if copying stripped off attribute information. 4) The network protocols used for FTP and rcp would have to be extended to allow attribute information to be sent over the net. Many sites depend upon use of the network to keep various systems in sync with each other. They would not be able to tolerate having attribute information disappear when the file is moved. 5) Programs that depended upon having attributes would not work with I/O over pipes or other kinds of streams. This could be a serious restriction for some kinds of program. Indeed you would be breaking the device-independence of your I/O system, since it would be impractical to have hidden attributes for any device other than disk or its equivalent. These are fairly serious costs, in implementation time, compatibility with other Unix systems, and increased complexity of the file system. It is very unlikely that many (any?) customers would want to pay these costs. I think these costs are clearly larger than any Fortran version skew, particularly since there are ways to get around the Fortran skew. You might be interested that the same issues came up on the DEC-20 when RMS was implemented there. They chose to put the RMS file attribute information in a header at the beginning of the file. I certainly don't know why that choice was made, but the costs of modifying TOPS-20 would have been similar to those of modifying Unix to support attributes. I urge you to talk to some of us in person before proceeding with anything like what you have proposed to do. The opposition to your suggestion is not based on misunderstanding, nor does it indicate a lack of sympathy with your goals. Any change to the simplicity of the file model *does* strike at the heart of Unix.
preece@ccvaxa.UUCP (08/06/85)
> There seems to be no easy answer. I stand firm with those, however, > who say don't tamper with the Unix way of storing files. At least > 99.99% of all the programs running on Unix machines will be "c" > programs, which don't need the overhead of having to account for the > special "fortran" stuff whenever they process files. (Sorry, pascal > and lisp programmers, I should lump you in with the "c" people.) ---------- If you really think that 99.99% of all the programs running on Unix machines will be "c" programs, you haven't dealt with the real world. There are many, many, MANY sites out there that use almost all their CPU time running things like SPICE and SPSS and a myriad of other existing engineering, statistical, and scientific applications written in (gasp) FORTRAN. As to where header information should go, I agree that there's no easy answer. Putting it outside the file seems preferable to me, because I think files should be homogeneous and because I hate to think about random access needing to work around the header and because I like making it truly invisible to uninformed processes. This doesn't violate "the Unix way of storing files" any more than the other out-of-band information already kept about files. Certain basic tools need to know about it (like dump/restore), others simply lose the data (like sort). If a vendor chose to add this feature (or if the standards committee chose to add it), either the existing commands would grow switches to recognize the header data or new commands would be added. No big deal. I kind of like the idea of a LISP-ish property list (or you could think of it as a file environment, if you like) in which any arbitrary (name, value) pairs could be stuck. Then a getfileproperty call would get a particular value. The first rule, of course, has to be to not break existing Unix. -- scott preece gould/csd - urbana ihnp4!uiucdcs!ccvaxa!preece
mjs@eagle.UUCP (M.J.Shannon) (08/08/85)
> As to where header information should go, I agree that there's no easy > answer. Putting it outside the file seems preferable to me, because I > think files should be homogeneous and because I hate to think about > random access needing to work around the header and because I like > making it truly invisible to uninformed processes. By and large, the only files in a UNIX system that are homogeneous are text files. Granted, these files probably account for the vast majority of files, but what about object modules, libraries of same, and executables? None of these are homogeneous, but no utilities (like cat, cp, wc, ls, etc.) have any trouble with them simply because they're not. > This doesn't > violate "the Unix way of storing files" any more than the other > out-of-band information already kept about files. Certain basic > tools need to know about it (like dump/restore), others simply lose > the data (like sort). Lose the data? You mean you'd prefer a copy of a file not to have any of the attributes of the original? That seems to me to forbid copying of executables (the non-homogeneous header gets lost in the shuffle, preventing execution of the copy). Sorry, this *DOES* violate "the UNIX way of storing files". > If a vendor chose to add this feature (or > if the standards committee chose to add it), either the existing > commands would grow switches to recognize the header data or new > commands would be added. No big deal. No big deal? Would you like to add major sections to *every* command that manipulates files to figure out its attributes? Or would you prefer to write N versions of each of those commands? > I kind of like the idea of a LISP-ish property list (or you could > think of it as a file environment, if you like) in which any arbitrary > (name, value) pairs could be stuck. Then a getfileproperty call > would get a particular value. This sounds reasonable, but why specify that it isn't part of the file data itself? Most files that have headers on them have a fixed property list, with the names (and types) defined in structures in /usr/include. Before you propose a new mechanism, you'll have to convince a whole host of people that their software is broken. Good luck. > The first rule, of course, has to be to not break existing Unix. But that seems to be the first rule broken when you propose that these attributes be stored anywhere but in the data of the file. > scott preece -- Marty Shannon UUCP: ihnp4!eagle!mjs Phone: +1 201 522 6063 Warped people are throwbacks from the days of the United Federation of Planets.