as@castle.ed.ac.uk (A Stevens) (01/16/91)
The need for a convention for filename extensions =============================================== One serious complication when porting code or file-structures to and from the Arch is its unfortunate combination of short (10 char) filenames and an absence of filename extensions. To convert filenames from DOS (effectively 11 chars if you count the extension) or UNIX or VMS some kind of transformation has to be performed. The catch is not that this is impossible, just that there are so many different ways to do it, so programs tend not work in the same way. Clearly some kind of de facto standard is required if we are to avoid the current situation which boils down to alternate sighs of relief and swearing. (The situation puts me in mind of the ``long command line'' problem that complicated life on ST's until things settled down). A proposal for Handling filename extensions ------------------------------------------- Turn ``filename.ext'' into ``ext.filename''. Advantages: + Can cope with those long filename suffixes that come up in UNIX. You could systematically convert a UNIX file ``foo.longsuff'' to ``longsuff.foo''. Compound suffixes might be neatly broken up by swapping . for / (not common inside UNIX or DOS filenames ). E.g. ``foo.e1.e2.e3'' into ``e1/e2/e3.foo''. The catch with this approach, pure, is that it loses information. Once a directory/file tree has been turned into ADFS form, you can no longer guess which directories are simply there to support filename suffixes, and which were part of the original directory structure. The best solution here seems to be the one Arxe systems use for their MultiFS. I.e. directories used to replace filename extensions are distinguished by having ``/'' as the last character of their names. E.g. foo.baz -> baz/.foo fred.tar.Z.uu -> tar/Z/uu/.fred The trailing slash also has a nice symmetrical feel when you use slashes to separate parts of a comound suffix. + This method would work with stuff like ``make'', ``amu'', and would (of course) interface nicely with Arxe's Multi-FS. Disadvantages: - This approach would require significant programming support. A whole bunch of routines would be needed to hide the transformation from naive ported DOS or UNIX code. Either file-access routines would need to be equipped with smart (and slow) routines to disambiguate combinations of ADFS and UNIX/DOS conventions (e.g. what is the suffix in ``$.Library.fred.c.Z''?) or porting would require a little programming effort. I.e. Filenames input would need to be converted to ADFS from UNIX/DOS conventions for internal processing, and then converted back to ADFS when the results are finally passed to file-access routines. In either case, file-access routines would need to be smart enough to create and delete suffix directories as required. - This convention is not understood used by what, for me at least, are pretty important programs. Graham Toals TeX, and Frank Lancaster's tar. Dealing with long filenames? ---------------------------- Handling filename suffixes is of course not the whole story. A similarly difficult problem is what to do with (UNIX etc) filenames that, even without suffixes, are longer than 10 characters. Again there are lots of different approaches. Some programs just truncate, others are a lot cleverer. Frank Lancaster's tar port seems to do something pretty smart with stripping out vowels and the like. Perhaps Frank could be persuaded to tell us the cunning details to provide a basis for a ``standard''? What do people feel about this issue? Will Acorn ever bring out a filing system with long names? :-) Andrew
rogersh%p2h@uk.ac.man.cs (01/16/91)
In article <7850@castle.ed.ac.uk> as@castle.ed.ac.uk (A Stevens) writes: >One serious complication when porting code or file-structures >to and from the Arch is its unfortunate combination >of short (10 char) filenames and an absence of filename extensions. > >To convert filenames from DOS (effectively 11 chars if you >count the extension) or UNIX or VMS some kind of transformation >has to be performed. The catch is not that this is impossible, >just that there are so many different ways to do it, >so programs tend not work in the same way. I have already had to tackle this problem. Basically there are several stages to converting a UNIX filename (or MSDOS since apart from the delimiter there is no difference) to ADFS. 1) Sort out the directory info. This means dealing with things like: .././../.info /.././../info /etc/../tmp/info etc. At this stage it is also necessary to do something about characters in the filenames. The method I use is to convert all filenames with certain common UNIX/MSDOS single character extensions (e.g. x.c, x.s, x.h) to s.x, c.x, etc. and otherwise just convert the '.' to a '_' along with all other ADFS-illegal characters '#$@' etc. E.g. info.Z @.Z.info info.tmp @.info_tmp /.././etc/../tmp/info $.tmp.info tmp/sort.c @.tmp.c.sort ../Makefile_src @.^.Makefile_src ./124$$.etc @.124___etc 2) Sort out long names in the resulting path. This is done by first removing vowels except from the first letter. E.g. Makefile_tmp Mkfile_tmp Afternoon_test_data Aftrnn_tst_dt Then if the component is still too long chop out sufficient characters from the 2cnd character onwards: Aftrnn_tst_dt Ann_tst_dt This has the result of in almost all cases preserving uniqueness with multiple filenames with the same root and different extensions, but also preserves the maximum meaning in the filename due to the selective removal of vowels first. 3) Check if we ought to create a directory for a single character suffix filename. If the file is to be opened for creation (perhaps implicitly) then we need to create the directory. Otherwise the access operation merely fails and we don't need to bother (indeed if we did it would waste disk space). Unfortunately 3) implies that the conversion needs to be integrated into a set of common file access routines. In unixlib it is called by open(), creat(), stat(), etc. and performs filename conversion transparent to the user. There is a global flag which can turn conversion on and off, and the routine can also be called directly. Starting any unixlib program with the environment variable UNIX set, automatically sets conversion on, else by default it is off. [ H.J.Rogers (INTERNET: rogersh%p4%cs.man.ac.uk@cunyvm.cuny.edu) ] [ ,_, (BITNET/EARN: rogersh%p4%cs.man.ac.uk@UKACRL.BITNET) ] [ :-(_)-o (UUCP: ...!uunet!cunyvm.cuny.edu!cs.man.ac.uk!p4!rogersh) ] [ _} {_ (JANET: rogersh%p4@uk.ac.man.cs) ]
as@castle.ed.ac.uk (A Stevens) (01/17/91)
(W.R.T. H. Roger's reply to my proposal) Phew - I knew someone out there had to have tackled the dreaded long file names problem. Well done Mr Rogers! I thought for one moment I'd have to rerach for the C compiler. Is there any chance that source for (the conversion routines at least?) will arrive with your (eagerly awaited) UNIX lib? Beg beg... the critical thing is, after all, that *everyone's* programs can behave in the same way. On a more serious note some queries / a wish-list: (1) is there any way of configuring the ``common'' filename suffixes (e.g. by setting environment variables). I am sure I'd want .lisp .lsp .pl .nip .l .sml .ml .thm .eqn .1 .2 .3 .4 ... switched to preceding directories, but I am equally sure other people might find such behaviour irrelevant or reprehensible. (2) It would be really good if there were some mechanism for recording the abbreviations made in a file in the relevant directory so that the things could be restored in the ADFS -> UNIX... direction. This is very important for people like me who shuffle big systems comprising complex file-structures backwards and forwards between Arch and the rest of the work. I know it would be tedious and slow ... but it would be tedious and slow in Arch milli-seconds rather than tedious and slow in my seconds. As is I have to do some really gross-me-out hacks with shell scripts to move stuff tar-ed off on the my Arch back onto UNIX. (3) Frank L.'s ``tar'' (superficially at least) seems to behave as you describe. Does this mean Frank uses your libs? Andrew
fl@tools.uucp (Frank Lancaster) (01/17/91)
If there is enough interest I will post the sources of the file name conversion routines used in my 'tar' port. I could also post the time conversion routines. I suppose posting sources will not hurt anybody's feelings. Frank Lancaster
vanaards%t7@uk.ac.man.cs (01/18/91)
I've been trying to get in contact with Frank Lancaster, however my mail seems to bounce right back. So I am resorting to posting here. Frank, could you please let us all have a copy of you UNIX to RISC OS filenaming conversion routine, the one you use in TAR. Thanks in anticipation Steven van Aardt. +--------------------------------+-----------------------------------------+ | ()()TEVEN () | | | () ()() | | | ()() () ()AN () () | | | () ()() ()()()() +-----------------------------------------+ | ()() () () ()ARDT |JANET E-mail : vanaards@uk.ac.man.cs.p4 | +--------------------------------+-----------------------------------------+
john@abccam.abcl.co.uk (John Grogan) (01/31/91)
In article <7850@castle.ed.ac.uk> as@castle.ed.ac.uk (A Stevens) writes: >The need for a convention for filename extensions >=============================================== >Will Acorn ever bring out a filing system with long >names? :-) > Andrew They already have. It's called RISCOS-NFS. John. ------ ============================================================================== "Unquiet slumbers for the sleepers in that quiet earth ..." ==============================================================================