bzs@BU-CS.BU.EDU.UUCP (12/20/86)
It seems unfair to cast aspersions at those who have pioneered Network File Systems as if their implementations were somehow finished or immutable. Praise should be given to how far the publication of their efforts has brought us in thinking about the issues (and the credibility that they are worth thinking about.) I can think of another major networking protocol which prides itself on having been put into practice early in its design cycle and corrected where need be (sometimes radically) based upon concrete use rather than paper committee meetings. The name escapes me however. Issues like "file organization" between heterogeneous systems have been raised for years. I know of no protocol which attempts to solve this in general (although a few special cases -do- go a long way.) The last time someone raised this issue in my office I asked him if this problem had been solved for magnetic tapes yet on his system? If so, I proposed that I could adapt that solution to FTP (the case in point at the time) easily enough. Needless to say he walked away in a huff. Some of these issues are HARD, very hard! I wouldn't go so far as to say insoluble (I mean, people do seem to solve them manually) but I think this difficulty should be considered before saying that XYZ does not solve this. Proposals for solutions would be most welcome. I think the problem is that people either want perfect and general solutions or they throw up their hands entirely. My suspicion is that the best solution will be the ability to code modules at an application level to handle the various permutations of file access methods between systems and let the libraries blossom out of the user community. Extensibility seems to be the key need here. And practice. As a more concrete example, why shouldn't FTP allow me to specify input and output filter programs on both ends, provided as a library by the systems? The same sort of thing should work for Network File Systems, although the ability to type files and have these "daemons" invoked automatically would probably be the right approach. Given a few years of that I suspect the "standards" would begin to reveal themselves. -Barry Shein, Boston University
bzs@BU-CS.BU.EDU (Barry Shein) (12/23/86)
I think there is a misconception brewing here about UNIX file semantics. It is true that the low level UNIX system calls (eg. OPEN, READ, WRITE, LSEEK) impose no structure on a file except as a stream of bytes. This is not peculiar to UNIX, most any O/S that I know of has some way to just get the bytes off the disk although systems which prefer structured/typed files tend to resist that and lead the user towards an access method. Of course, a processor with a sophisticated IOP (eg. a data base back-end) might be an exception to this rule but I believe such situations are beyond the scope of this discussion. Any access method could be layered on top of the UNIX low level calls, and many have been. Surely I could write DEC's RMS or IBM's access methods in terms of these simple calls. As a specific example, consider the UNIX DBM calls which stores arbitrary data as hashed key/value pairs. This presents the same problem as any more structured system (eg. how would I fetch the next key/value pair out of such a file from a remote, non-UNIX system? No different really than fetching the next ISAM record etc.) This is somewhat in response to Geoff's note (which was a very good direction for thought.) I am only saying that the problem is entirely symmetrical, there is no magic property of access methods whether built into an O/S or supplied as applications libraries, bytes is bytes. The only possible difference is that a system that provides many access methods might be able to make a list quickly of access methods which users are probably using (give or take how the users employed the various options such as record-size, blocking, bucket-size etc etc.) I only bring this up so that we don't wring our hands over what I believe to be a common misconception. Any O/S could (I presume) present their files as a stream of bytes, the problems would then be symmetrical. There are some differences, such as guaranteed atomicity of updates and types of failure (eg. how extents are handled) but I don't believe this level of detail is yet where this discussion has found itself and, I suspect, would be solvable within any scheme that solves the other, more salient problems. However, unlike Geoff, I am more pessimistic IN THE GENERAL CASE. If two systems have a !very! similar access method, such as an ISAM implementation, then writing an interface between the two should be relatively straight-forward (although it is still fraught with danger, eg IBM's V-record format uses 16-bits to express lengths, another system may not use 16-bits although it supports a V-record format, how compatible could you make those two access methods?) In the case where the access method doesn't exist at all I can't see how it could be utilized at all (oh, I suppose a V-record could be returned to a text-oriented application as "string<CR><LF>" but that sort of thing is limited as a solution.) I won't even mention the Fortran programmer who would like to access a file full of 128-bit binary floating point values via this NFS (no, XDR doesn't work unless someone knows it's time to employ it, it still may not work, does your machine have 128-bit floats?) I don't think it's insoluble, but I do suspect we will have to be prescriptive (rather than descriptive) to provide a standard. Given a standardized menu of network access methods could *you* do your work? -Barry Shein, Boston University
bzs@BU-CS.BU.EDU.UUCP (12/26/86)
I am surprised that a lot of this discussion has centered around pathname'ing. It always seemed to me to be one of the easier things to either fake or punt (fake: use UNIX syntax on a UNIX workstation as NFS does, punt: use a quoted syntax such as the PUP/Leaf's convention of {HOST_OR_DEVICE}<any_string_the_other_os_can_interpret>.) Of course, there is always the possibility of coming up with a standard, universal catalogue syntax, similar in spirit I guess to the Library of Congress' universal conventions for finding something. Then we could all either use that syntax or at least support it. I always thought it was file formats (access methods) that were the problem (or have we decided that this is too hopeless to even think about?) Maybe we need to make a list of issues, here's mine: 1. File naming. 2. Path naming. 3. File formats and access methods (eg. ISAM, stream...) 4. File access semantics (eg. atomicity of updates, error handling, authorization, etc etc etc.) (5. Performance?) -Barry Shein, Boston University