[comp.protocols.appletalk] MacNFS's file mapping.

tom@CITI.UMICH.EDU (07/17/87)

The problem in accommodating both Macintosh and UNIX file system
semantics breaks down into four areas:  storage of the data, resource,
and finder info forks; format of text files; mapping of file names; and
storage of desk top information.

Solutions to this problem have been attempted by the people who
do the A/UX toolbox in their file copy utility, Columbia University
in their UNIX AppleShare file server, and the University of Michigan
(CITI) in our MacNFS client.



I.      STORAGE OF THE DATA, RESOURCE, AND FINDER INFO FORKS

1.      Storing the forks in separate UNIX files.

A/UX, aufs, MacNFS, EFS, and TOPS all use this approach.

A/UX divides a Macintosh file "mfile" into two UNIX files: the data
fork goes into a UNIX file "mfile", and the info and resource forks are
combined into a UNIX file "mfile.res".

aufs uses subdirectories ".resource/" and ".finderinfo/", so the three
forks are stored in "mfile", ".resource/mfile" and ".finderinfo/mfile".
One drawback here is that it's not readily apparent whether a file has
resource or info forks.

MacNFS puts the data fork in "mfile", the resource fork in "mfile.RF",
and the info fork in "mfile.IF".  This follows a convention established
in the earlier EFS.

Having many forks in the same directory complicates some aspects of
NFS, although it simplifies others.  At CITI, we originally thought
that placing the fork files together in a directory would make them
easier to manipulate:  we were thinking especially about wild carding
("mfile*").  Experience has shown that this feature is not often used,
and that the file system clutter is considerable.

The choice of subdirectories seems a favorable solution, trading some
ease of manipulation for less directory clutter.  And it's not hard to
imagine simple tools -- mrm, mcp, mmv -- for file manipulation on the
UNIX side.


2.      Storing the forks in a single file.

A suggestion has been made that all three forks can be stored in one
UNIX file, with the offset to each fork stored at the beginning of the
file.

The goal of mapping between Macintosh and UNIX file system semantics is
to allow two-way access to files.  Macintosh users must be able to use
native UNIX files stored on the server, and UNIX users must be able to use
Macintosh files stored on the server.  This use must be transparent, or
the UNIX server is nothing more than a file store.  

Further, manipulation of the file by MacNFS requires expensive network
traffic and complicates a piece of code that has size restrictions.

Nothing is gained by making UNIX files second-class citizens from the
Macintosh side.  We can do much better if we store the forks in
separate files.



3.      Other issues.

What if there is no resource fork?  Should an empty resource file be
created on the server?  At CITI, we think not.  MacNFS doesn't require
the existence of a resource file when that fork is empty.  The
alternative requires a ghastly number of file creates when mounting,
say, /usr/src/bin.

What if there is no data fork?  A UNIX file is pure data fork (if you
squint right), so it seems reasonable to equate an empty data fork with
an empty UNIX file.  MacNFS always creates the datafork.  And it sure
simplifies the code, whether or not other forks are stored in
subdirectories.



II.     FORMAT OF TEXT FILES.

Since the Macintosh and UNIX use different characters to terminate
lines, we need a standard format for storing text files on UNIX.
We also need to decide when translation occurs.

aufs does not translate text files at all -- translation is left up to
users.  Therefore, a user on either system sometimes has to know the
format of a file before using it.

MacNFS translates text files as they are read and written,
interchanging <LF> and <CR>.  The tricky part is deciding when
a file contains straight text.

MacNFS allows a user to set certain options when a volume is mounted.
One pair of options sets the default file type and file creator, which
are used for UNIX files that don't have finder info forks.  Another
option allows a user to declare the file type for which translation
will occur.

Normally, we set both the default file type and the translation file type to
'TEXT'.  This interoperates well with the UNIX notion of a "text file",
i.e., any regular file, but presents problems when accessing UNIX binary
files, such as font bitmaps or other graphic objects.  The user can prevent
translation of a UNIX file by usings a desk accessory such as SetFile to
give the UNIX file a type other than TEXT before the file is read.

At CITI, we have considered inspecting the first part of a file to
decide whether it's TEXT or DATA, like the UNIX "file" command, but the
necessary network traffic appears to preclude efficient implementation on
the client side.  This sort of thing may be feasible for aufs, since the
server can be modified to peek at the file.



III.    FILE NAME MAPPING.

MacNFS uses the following translation scheme.  First, since `:' is not
a legal character in Macintosh file names, and `/' not legal in UNIX
file names, they are uniformly interchanged.  Thus a Mac file "abc/def"
looks like "abc:def" from the UNIX side.  Similarly, a UNIX file called
"abc:def" looks like "abc/def" on the Macintosh.

Other special characters in Macintosh file names are encoded as "^XX"
on the UNIX side, where "XX" is the hexadecimal encoding of the
character.  E.g., the hexadecimal encoding of TM, the trademark symbol,
is 0xAA, so TM is represented as "^AA".

A problem arises if a Macintosh file name contains the character
sequence "^XX" for a valid hexadecimal sequence.  E.g., a Macintosh
file named "ab^62c" copied to a UNIX file server becomes "abbc" when
viewed from the macintosh again.

CITI has no idea how to live within the 14-character name length
constraint imposed by System V file systems.


IV.     DESK TOP INFORMATION.

aufs implements the shared volume calls that provide the desk top
information to the Finder.  The desk top information is stored in the
UNIX files .ADeskTop and .IDeskTop in the root directory of the mounted
volume.

In MacNFS, the finder uses direct reads and writes to "DeskTop" and
"DeskTop.RF" in the root directory of the mounted volume.  But we plan
to follow Columbia's lead and implement the shared volume calls,
storing the information in a file somewhere in the root directory.
With that done, we will have control over the format of the DeskTop
file MacNFS creates.

Both aufs and MacNFS make it difficult to mount read-only volumes. At
CITI, we toyed with keeping the DeskTop in RAM -- the code got very
hairy and we dropped it.  But the ability to mount read-only volumes
would be A Good Thing.

Tom Unger

Send comments to:
MacNFS@citi.umich.edu

cck@CUNIXC.COLUMBIA.EDU (07/19/87)

Tom's message is very well thought out.  I do have some things to add
though.

Let's take a step back and define the "requirements" as we (Bill and
I) saw them.

>I.      STORAGE OF THE DATA, RESOURCE, AND FINDER INFO FORKS

The format should allow:
Primary:
	P1 storage of Macintosh files under Unix with complete information
	  (e.g. resource, data, and "finder info" forks)
	P2 use of Unix files under the Mac OS
	  (e.g. allow Mac to access files not stored as in P1)
	P3 Quick, efficent access for the various network servers/clients
	  (e.g. allow Mac NFS/Aufs/Tops to enumerate and access files, etc)

Secondary:
	S1 access to Macintosh files stored on a Unix file system through Unix



Bill S. and I both strongly disagree with the approach of combining
the three files into one file!  There are significant disadvantages
and few advantages.  A few disadvantages are: need to special routines
to handle files under (S1), difficulties in handling (P2), problems
with "holes" in unix files that this would require, etc.  The primary
advantage would be that it appears to the "naive" user to be simpler,
and the coordination of the three parts would be "builtin" (e.g. you
wouldn't ever be left in the situation where you have a .resource file
and no .finderinfo and data files).  This method might well be the
method of choice if the Unix system were only a file server and naught
else.

The primary differences between the following two approaches are:
	o Aufs scheme has better coherence than EFS scheme, though
same as A/UX scheme.
	o Aufs scheme is easier to implement!

Using three files in one directory.  As Tom noted, if anything set a
standard in the past, it was EFS.  We thought about it long and hard
before we decided not to go with this scheme.  I don't remember the
details of the conversation, but will enumerate some of the advantages
of the scheme we decided upon later.  The EFS and A/UX schemes effects
the goal (P1) completely, (P2) requires a by-pass mechanism for the
EFS scheme and can be considered to be convered by the A/UX scheme and
(P3) is reasonably handled.  (S1) is also well-handled.

The Aufs scheme is quite simple (Tom covered most of this, but wish to
reiterate with some justifications).  Simply: the data fork is the
closest match to a unix file, therefore store it as-is in the
specified directory (same as A/UX), the resource fork and the so
called "finder info" fork (mostly part of desktop on Mac - finder info
in resource fork is still there though) are "special" and can be
stored by the same name in special subdirectories of the specified
directory.  To be concrete, the Mac file "keeper" stored in a
directory "stuff" would be stored by Aufs on the unix file system as:
	stuff/keeper			- data fork
	stuff/.finderinfo/keeper	- "finder info" fork
	stuff/.resource/keeper		- resource fork

Advantages: easy to scan directories for files, easy to manipulate Mac
and Unix files in a rational way, elegant - most implementation
decisions are resolved in an easily managable way with few problems.

Disavantages: pain to do copies, moves, deletes on stored Mac files
without utility programs.

With one caveat, this scheme completely covers the goals P1-P3 and S1
listed above.  Caveat: to implement P2, we must "default" finder
information for unix files (e.g. assign "default" finder information
to be used when no ".finderinfo/.." file is found).

Enough of this though - I could go on listing advantages and
disadvantages for a long time.  You know the scheme I advocate.

> 3.      Other issues.
(E.g. no data fork, no resource fork situations.)

Well, for Aufs I think the best way to explain this is to say that a
directory with a .finderinfo and .resource directories is considered
to be a "Macintosh" directory - e.g. a reasonable place to store
MacIntosh files.  (The distinction also makes it easy for us to simply
tell people that only certain directories are special (e.g. have the
special subdirectories)).  We believe the tradeoffs here - primarly
that you cannot store a Macintosh file just anywhere (as a matter of
fact, I consider this a distinct advantage) - are reasonable.  

Aufs will only create the .finderinfo and .resource directory when it
receives the "create directory" command - e.g. "New Folder".  This
means that for the various "unix" directories (for example /usr/bin),
no junk will be left lying around.  We believe this to be important
and it quickly resolves the issues of when to create files - iff the
approriate directories exist.

One more issue that Tom did not bring up is that the contents of the
so called finder information fork needs to be standardized.  Currently
Aufs stores the 32 bytes of finder information (cf. AFP spec.) and any
comment in this file.  Additional information might be a AFP "short
name" for MS-DOS style clients and/or a mapping from the 14 character
SVID file names to 32 character Macintosh file names - some careful
though is required to determine if this is the appropriate place to
place these mappings (some more on this later).

> II.     FORMAT OF TEXT FILES.

Nothing to add to what Tom has to say except that he has some good
ideas here.  Hopefully, we will add some (in some form to Aufs).


> III.  FILE NAME MAPPING.  

Aufs does handle it slightly differently.
	Mac name to Unix name:
		Any non printing character (and "/") is stored as two
hexidecimal digits "escaped" by a colon (v.s. a ^ under NFS).
	Unix name to Mac name:
		Treats ":hh" as the hex representation of a character.
Sequences as "::" or ":" followed by a non-hex character result in the
":"(s) being translated into a "|"(s).  We chose ":" because it
couldn't be in Mac file names and is rarely if ever used in Unix file
names.

The 14 character file name problem that all the SVID compliant systems
such as A/UX, HPUX, etc. pose can be resolved in two ways:
	o head in sand - simply don't allow names longer than 14 characters
		(not really so ridiculous - most names are reasonable).
	o some mapping database - can live in three places reasonably
		a) as part of finderinfo or another such "special" file
		b) as part of the volume desktop information
		c) in the directory as so-called "local" desktop
		   information 

Not sure which to do yet.  We don't consider (b) to be a particularly
efficent or clean solution (reeks too much of keeping a "directory" of
the files in the volume - real problem for unix files and being able
to access files via unix).

Another problem to be mentioned is that the Mac OS doesn't distinguish
case while Unix does.  Aufs simply ignores the difference because most
Mac OS (if not all) utilities will display the correct case and use
the correct case in accessing the files.  A notable exception is MPW.
A simple solution might be to simply lowercase everything, but then
you have the problem that two unix files Makefile and makefile can
co-reside - which one is the right one?  The way things are now you
will get the one with the case you specify (e.g. always the right one
- not sure if both are displayed by finder/standard file package
though).  (Simple solution - make Mac OS distinigush case in file
names :-).

> IV.     DESK TOP INFORMATION.

Aufs seperates the icon and application info into .IDeskTop and
.ADeskTop for one reason - it's simpler to handle.  We were careful
about the amount of information that had to be shared per volume (e.g.
.ADeskTop and .IDeskTop files) because of the problems in resolving
competing read/writes.  (Note: for files, just hope for best!!! - this
means two people with write permission to the same volume had better
be careful!!!).

Aufs supports read-only volumes right now.  In fact, the two primary
uses of Aufs at our site is as: (a) private (read individual) file
storage were coordination of read/writes is not a real issue and (b)
shared read-only volumes.





I guess I've gone on enough, but would just like to say, that where
previously existing methods existed, we thought carefully before
trying to supplant them with our own methods - in all cases we felt
there was sufficient justification to do so.

One more thing - I've listed our primary requirements (p1-p3) above.
We believe that Aufs does a decent job in meeting them.  If you wanted
to drop some of the requirements such as (P2), then different
strategies would go into effect.  In implementing Aufs, careful
thought was put into making Aufs layered in such a way that the
protocol specific parts were seperated from not only the OS dependent
parts (which have turned out to be fairly Unix OS independent - not
suprising though), but also the parts that implement the particular
paradigm (e.g. the model has the server allowing functions P1, P2, P3,
and S1).  Thus, if you really don't have a need for some of the
primary requirements, then you can also take the Aufs source code and
make it into what you really want without a massive (but not
inconsiderable) amount of work (e.g. you won't be completely
reinventing the wheel).

I know I've missed some points, but I hope this provides a better
insight into why Aufs does things the way it does.

Charlie C. Kim
User Services
Center for Computing Activites and Libraries
Columbia University