[mod.protocols.tcp-ip] Remote file systems

geof@decwrl.DEC.COM@imagen.UUCP (Geof Cooper) (12/22/86)

I would venture to say that the problem of <<transmitting>> file semantics is
solvable, and even reasonably well understood.  It is not dissimilar
to the problem of transmitting abstract data types, which is well described
in Maurice Herlihy's Master's thesis ("Transforming Abstract Values
in Messages", S.M. thesis, MIT, 1980 -- there was a paper about it, too,
but I don't happen to have a reference to it).  The fundamental
idea is that you can solve the N^2 problem of translating file
(or terminal, or abstract) data types between N different machines either
by brute force or by standardizing on one "transmissible" data type
for purposes of transmission (e.g., ASCII, various FTP transfer modes,
"binary" file formats (such as Interpress, DDL, Impress), bigendian
number semantics, IEEE floating point format, TAR tapes, punch cards).

I'd like to tag the "real problem" as the "interface problem," at least
for the purposes of this discussion.

The "interface problem" is that the set of capabilities of the transmissible
data type may not be the same as the capabilities of a particular system.
For example, EBCDIC and ASCII don't necessarily overlap in all the codes
they define.  A more pertinent example is that Unix OPEN calls don't give
a way to specify that a file is textual, so applications don't generate
any information about what they are <trying> to do when the modify the
file system.  So it doesn't matter if you have a textual file "type"
in NFS, since UNIX doesn't give you a way to know that you're supposed
to be using it.

I've seen three generic attempts to solve this problem:

    [1] Modify all systems to use the transmissible type (ascii,
        IEEE floating point, ISO protocols, Interscript, virtually
        all standards).

    [2] Modify all systems to have functionality appropriate to the
        transmissible type and translate on the fly in each system
        (IBM machines sending to ASCII printers, graphics applications
        that change their capabilities to fit the printer (or page
        description language) [cf Macintosh original ROM's versus
        second version ROM's that gave characters fractional widths to
        cope with laserwriter]). 

    [3] Define a broad transmissible type, but not every system has to
        implement the whole thing.  Systems can intercommunicate where
        there is an overlap of supported options. (Telnet (esp SUPDUP),
        FTP, prob ISO-FTAM).

The advantage of [1] is that it works the best, but the problem is that
it disrupts the systems, and tends to inhibit technical progress (since
adding a new feature requires distributed consensus and implementing
it on all machines).  [2] still requires that applications change, but
it can be workable when the system in question already implements part
of the transmissible type.  For example, I believe that the ATT guys
have found it possible to add "manditory file locking" to UNIX for some
files.

Approach [3] is pretty common, and can achieve good but limited results
(e.g., you can use FTP between any two machines for textual files, assuming
they implemented FTP correctly).  Unfortunately, it is really the brute force
solution to the N^2 problem in disguise.  For example, how many machines
actually implement ALL the telnet options (How many implementors (or even
system architects) could list them all without looking)?

Usually, of course, a mixture of the three is involved.  For example,
a UNIX machine can easily know to receive a "textual type" file
correctly using [2], even if it doesn't know how to generate one.

All this is not to put a damper on the interesting discussion that is
going on about NFS.  Rather, it is my intent to try and raise the
level of that discussion to more general issues.

    - Are there other approaches to solving the interface problem?
      (I thought about it for a whole 10 minutes, so please shoot
      bullets at my arguments)?

    - Can people who are familiar with NFILE, NFS, FTAM, etc..,
      characterize them in terms of the "interface problem", above,
      so we can compare them abstractly?

    - Can we come up with a particularly good mix of the 3 approaches
      to solve the problem well for file systems? (did ISO?)  Or is
      blind standardization the only way (it would be disappointing if
      it were) -- just tell everyone to use UNIX?

Any ideas?

- Geof