[comp.sys.acorn] DOS/UNIX/etc <-> ADFS Filename Mapping

as@castle.ed.ac.uk (A Stevens) (01/16/91)

The need for a convention for filename extensions
===============================================


One serious complication when porting code or file-structures
to and from the Arch is its unfortunate combination
of short (10 char) filenames and an absence of filename extensions.

To convert filenames from DOS (effectively 11 chars if you
count the extension) or UNIX or VMS some kind of transformation
has to be performed.  The catch is not that this is impossible,
just that there are so many different ways to do it,
so programs tend not work in the same way.

Clearly some kind of de facto standard is required if we
are to avoid the current situation which boils
down to alternate sighs of relief and swearing.
(The situation puts me in mind of the ``long command line''
problem that complicated life on ST's until things settled down).


A proposal for Handling filename extensions
-------------------------------------------

Turn ``filename.ext'' into ``ext.filename''. 

Advantages:

+ Can cope with those long filename suffixes that come up
in UNIX.  You could systematically convert a UNIX
file ``foo.longsuff'' to ``longsuff.foo''.  Compound suffixes
might be neatly broken up by swapping . for / (not common
inside UNIX or DOS filenames ).  E.g. ``foo.e1.e2.e3'' into 
``e1/e2/e3.foo''.

The catch with this approach, pure, is that it loses information.  
Once a directory/file tree has been turned into ADFS form,
you can no longer guess which directories are simply there
to support filename suffixes, and which were part of the
original directory structure.  The best solution here seems to
be the one Arxe systems use for their MultiFS.  I.e.
directories used to replace filename extensions are
distinguished by having ``/'' as the last character of their
names.   E.g.

foo.baz -> baz/.foo
fred.tar.Z.uu -> tar/Z/uu/.fred

The trailing slash also has a nice symmetrical feel when
you use slashes to separate parts of a comound suffix.

+ This method would work with stuff like ``make'', ``amu'',
and would (of course) interface nicely with Arxe's Multi-FS.

Disadvantages:

- This approach would require significant programming support.

A whole bunch of routines would be needed to hide the
transformation from naive ported DOS or UNIX code.
Either file-access routines would need to be equipped with
smart (and slow) routines to disambiguate combinations of
ADFS and UNIX/DOS conventions (e.g. what is the
suffix in ``$.Library.fred.c.Z''?) or porting would require
a little programming effort.  I.e. Filenames input would need
to be converted to ADFS from UNIX/DOS conventions for internal
processing, and then converted back to ADFS when the results
are finally passed to file-access routines.

In either case, file-access routines would need to be smart
enough to create and delete suffix directories as required.

- This convention is not understood used by what, for me at least,
are pretty important programs.  Graham Toals TeX, and
Frank Lancaster's tar.


Dealing with long filenames?
----------------------------

Handling filename suffixes is of course not the whole
story.  A similarly difficult problem is what to do with (UNIX etc)
filenames that, even without suffixes, are longer than
10 characters.  Again there are lots of different
approaches.  Some programs just truncate, others are a lot
cleverer.  Frank Lancaster's tar port seems to do something
pretty smart with stripping out vowels and the like.

Perhaps Frank could be persuaded to tell us the cunning
details to provide a basis for a ``standard''?


What do people feel about this issue?
Will Acorn ever bring out a filing system with long
names? :-)

           Andrew

rogersh%p2h@uk.ac.man.cs (01/16/91)

In article <7850@castle.ed.ac.uk> as@castle.ed.ac.uk (A Stevens) writes:
>One serious complication when porting code or file-structures
>to and from the Arch is its unfortunate combination
>of short (10 char) filenames and an absence of filename extensions.
>
>To convert filenames from DOS (effectively 11 chars if you
>count the extension) or UNIX or VMS some kind of transformation
>has to be performed.  The catch is not that this is impossible,
>just that there are so many different ways to do it,
>so programs tend not work in the same way.

	I have already had to tackle this problem. Basically there are
several stages to converting a UNIX filename (or MSDOS since apart from
the delimiter there is no difference) to ADFS.

1) Sort out the directory info. This means dealing with things like:

	.././../.info
	/.././../info
	/etc/../tmp/info

	etc. At this stage it is also necessary to do something about
 characters in the filenames. The method I use is to convert all filenames
with certain common UNIX/MSDOS single character extensions (e.g. x.c, x.s,
x.h) to s.x, c.x, etc. and otherwise just convert the '.' to a '_' along
with all other ADFS-illegal characters '#$@' etc. E.g.

	info.Z				@.Z.info
	info.tmp			@.info_tmp
	/.././etc/../tmp/info		$.tmp.info
	tmp/sort.c			@.tmp.c.sort
	../Makefile_src			@.^.Makefile_src
	./124$$.etc			@.124___etc

2) Sort out long names in the resulting path. This is done by first removing
vowels except from the first letter. E.g.

	Makefile_tmp			Mkfile_tmp
	Afternoon_test_data		Aftrnn_tst_dt

	Then if the component is still too long chop out sufficient characters
from the 2cnd character onwards:

	Aftrnn_tst_dt			Ann_tst_dt

	This has the result of in almost all cases preserving uniqueness
with multiple filenames with the same root and different extensions, but also
preserves the maximum meaning in the filename due to the selective removal
of vowels first.

3) Check if we ought to create a directory for a single character suffix
filename. If the file is to be opened for creation (perhaps implicitly)
then we need to create the directory. Otherwise the access operation merely
fails and we don't need to bother (indeed if we did it would waste disk
space).

	Unfortunately 3) implies that the conversion needs to be integrated
into a set of common file access routines. In unixlib it is called by open(),
creat(), stat(), etc. and performs filename conversion transparent to the
user. There is a global flag which can turn conversion on and off, and the
routine can also be called directly. Starting any unixlib program with
the environment variable UNIX set, automatically sets conversion on, else
by default it is off.


[ H.J.Rogers (INTERNET: rogersh%p4%cs.man.ac.uk@cunyvm.cuny.edu)       ]
[    ,_,     (BITNET/EARN: rogersh%p4%cs.man.ac.uk@UKACRL.BITNET)      ]
[  :-(_)-o   (UUCP: ...!uunet!cunyvm.cuny.edu!cs.man.ac.uk!p4!rogersh) ]
[   _} {_    (JANET: rogersh%p4@uk.ac.man.cs)                          ]

as@castle.ed.ac.uk (A Stevens) (01/17/91)

(W.R.T. H. Roger's reply to my proposal)

Phew - I knew someone out there had to have tackled the dreaded
long file names problem.   Well done Mr Rogers!  I thought
for one moment I'd have to rerach for the C compiler.
Is there any chance that
source for (the conversion routines at least?) will arrive with
your (eagerly awaited) UNIX lib?  Beg beg...  the critical thing
is, after all, that *everyone's* programs can behave in the same
way.

On a more serious note some queries / a wish-list:

(1) is there any way of configuring the ``common'' filename suffixes 
(e.g. by setting environment variables).  I am sure I'd want
.lisp .lsp .pl .nip .l .sml .ml .thm .eqn .1 .2 .3 .4 ...
switched to preceding directories, but I am equally sure other
people might find such behaviour irrelevant or reprehensible.

(2) It would be really good if there were some mechanism for recording
the abbreviations made in a file in the relevant directory so that
the things could be restored in the ADFS -> UNIX... direction.
This is very important for people like me who shuffle big
systems comprising complex file-structures backwards and forwards
between Arch and the rest of the work.

 I know it would be tedious and slow ... but it would be 
tedious and slow in Arch milli-seconds rather than tedious
and slow in my seconds.   As is I have to do some really gross-me-out
hacks with shell scripts to move stuff tar-ed off on the my Arch
back onto UNIX.


(3) Frank L.'s ``tar'' (superficially at least)
seems to behave as you describe.  Does this mean Frank uses your libs?


Andrew

fl@tools.uucp (Frank Lancaster) (01/17/91)

If there is enough interest I will post the sources of the
file name conversion routines used in my 'tar' port. I could
also post the time conversion routines. I suppose posting
sources will not hurt anybody's feelings.

Frank Lancaster

vanaards%t7@uk.ac.man.cs (01/18/91)

  I've been trying to get in contact with Frank Lancaster, however my mail
seems to bounce right back. So I am resorting to posting here.

  Frank, could you please let us all have a copy of you UNIX to RISC OS 
filenaming conversion routine, the one you use in TAR.

  Thanks in anticipation
  
  Steven van Aardt.

+--------------------------------+-----------------------------------------+
|   ()()TEVEN         ()         |                                         |  
|  ()                ()()        |                                         |  
|   ()()   ()  ()AN ()  ()       |                                         |
|      ()   ()()   ()()()()      +-----------------------------------------+
|   ()()     ()   ()      ()ARDT |JANET E-mail : vanaards@uk.ac.man.cs.p4  |
+--------------------------------+-----------------------------------------+

john@abccam.abcl.co.uk (John Grogan) (01/31/91)

In article <7850@castle.ed.ac.uk> as@castle.ed.ac.uk (A Stevens) writes:
>The need for a convention for filename extensions
>===============================================
>Will Acorn ever bring out a filing system with long
>names? :-)
>           Andrew


They already have. It's called RISCOS-NFS.

John.
------
==============================================================================

"Unquiet slumbers for the sleepers in that quiet earth ..."

==============================================================================