[comp.protocols.tcp-ip] syntax of remote pathnames?

mckee@CORWIN.CCS.NORTHEASTERN.EDU (George McKee) (03/17/89)

Is there any standard notation for file on some remote host where
the filename includes the host's domain name?  People seem to
talk around the issue when the problem comes up in writing 
and use expressions like:
	ftp somehost.someUniv.EDU
	get pub/goodstuff/goodthing
when it would make the writing flow much more smoothly if you
could say things like "you can find a program that might do what
you want in @SUMEX-AIM.STANFORD.EDU:info-mac/app/contour81.hqx".
	If there's no standard for this, I'll propose that a
pathname beginning with @ be interpreted as an Internet File Path
and that the string between the @ and the following : conform to
the normal domain syntax and denote the host filesystem that the
IFP refers to.  Anything following the : will use whatever syntax
is native to the host filesystem.
	An obvious alternative is pathname@domain.name like in
mailboxes, but people don't use this either.  I don't know why.

	- George McKee
	  NU Computer Science

SRA@XX.LCS.MIT.EDU (Rob Austein) (03/17/89)

One syntax that's often used is

    hostname:pathname

eg, "Bigboote.LCS.MIT.EDU:/etc/passwd",
    "Reagan.AI.MIT.EDU:>File-Server>FOO.Lisp",
    "XX.LCS.MIT.EDU:XX:<SRA>LOGIN.CMD".

This syntax has been used by the the generic filesystem code in the
Lisp Machine family of systems (MIT CADR, LMI, TI, Symbolics) for a
long time, and was used by some commands (rcp et al) in BSD 4.2.

Of course, there will always be filesystem syntaxes that will confuse
the simpleminded program, eg "MC.LCS.MIT.EDU:PK0:SRA;SRA LOGIN" (the
space is a required part of the filename syntax, and the local portion
could equally well have been written as "PK0: SRA; SRA LOGIN").

Perhaps just stating the hostname and a string to ask that host for in
plain English wasn't such a bad idea after all!

--Rob

CERF@A.ISI.EDU (03/18/89)

There was once a convention used for the form:
[domain name of host]<directory name> file name.extension

Vint Cerf

msd@trwind.UUCP (Marc S. Dye ) (03/23/89)

In article <8903161632.AA16971@corwin.CCS.Northeastern.EDU> mckee@CORWIN.CCS.NORTHEASTERN.EDU (George McKee) writes:
> ...
>when it would make the writing flow much more smoothly if you
>could say things like "you can find a program that might do what
>you want in @SUMEX-AIM.STANFORD.EDU:info-mac/app/contour81.hqx".

It would be *WONDERFUL* to have a canonical format for file pathnames.
I suggest you don't imbed naked colons ':' anytime soon though.  VMS
and probably TOPS-20 will get sick.  Also note that Unix allows ':',
'@', or almost anything except '/' in a pathname.

Maybe some quoting is in order?  For example:

	@SRI-NIC.ARPA:`PS:<RFC>RFC1087.TXT'

is almost(?) as readable, and possible interpretable by a non-human
(just need to strip matching `', while feeling free to interpret the
naked ':' and '@' as syntactic operators).  Note that one obvious
alternative (escaping) gets *ugly* in a hurry:

	@SRI-NIC.ARPA:PS\:\<RFC\>RFC1087\.TXT

If you wanted to go further, you could talk about other 'operators' in
Unified File Speak.  Seriously, you could presumably adopt any regular
expression syntax for 'wildcarding', extended with a useful alternation,
and add in some sensible directory recursion, distinguishable from normal
(non-recursive) wildcarding.

The BSD Unix csh-style pathname syntax is fairly nice, but doesn't do
recursive directory descent.  VMS has some of these notions, but I'm
not personally wild about the [] pairs.  I.e. I like:

	/foo/.../bar

better than:

	[foo....]bar.

User names (or in more global sense, named access restriction classes)
would also be nifty:

	ANONYMOUS@SRI-NIC.ARPA:`PS:<RFC>RFC106'[5-7]`.TXT.0'

Note that the non-quoted parts behave according to canon; quoted stuff
can be as grotesque as any vendor likes.

Just suggestions ...

For fun, how about:

	@woof.Poundmasters.Com:`D:\HOSED'/.../{SHARE,SHAREALIKE}`.EXE'
	KO@VirulentlyMalignantSoftware.Com:`DQB666:[MANUALS]InThere.SOMEWHERE;0'

Enough fun -- back to the salt!

++msd

mrc@SUMEX-AIM.STANFORD.EDU (Mark Crispin) (03/24/89)

Some of us old-timers have used the convention of putting the host name in
square brackets in "[host]path" format, e.g.:
	[WSMR-SIMTEL20.ARMY.MIL]PD1:<MSDOS.NEMACS>EM39EXE.ARC
	[SUMEX-AIM.STANFORD.EDU]info-mac/app/contour81.hqx
	[SAIL.STANFORD.EDU]MONCOM.UPD[S,DOC]
	[AI.AI.MIT.EDU].INFO.;DDT ORDER

The path field is, of course, dependent upon the target operating system and,
as the final example shows, can be bizarre, so you need some reasonable way
to infer the path out of context (e.g. using all uppercase, or make it be a
wholeline, etc...).

-------

goldstei@NSIPO.NASA.GOV (Steve Goldstein) (03/24/89)

S*T*O*P !!!!!

You all are making things start to look like [you should pardon the 
expression . . .]

				VMS !!!

Part of the beauty of using UNIX is that one needn't have to recall all
the different decorations for hosts, devices, directories, etc. in specifying
a file.  (That's a feature of VMS which drives me up the wall!)

--SG

	 Some of us old-timers have used the convention of putting the host nam
	e in
	 square brackets in "[host]path" format, e.g.:
	 	[WSMR-SIMTEL20.ARMY.MIL]PD1:<MSDOS.NEMACS>EM39EXE.ARC
	 	[SUMEX-AIM.STANFORD.EDU]info-mac/app/contour81.hqx
	 	[SAIL.STANFORD.EDU]MONCOM.UPD[S,DOC]
	 	[AI.AI.MIT.EDU].INFO.;DDT ORDER

	 The path field is, of course, dependent upon the target operating syst
	em and,
	 as the final example shows, can be bizarre, so you need some reasonabl
	e way
	 to infer the path out of context (e.g. using all uppercase, or make it
	 be a
	 wholeline, etc...).

	 -------

jqj@HOGG.CC.UOREGON.EDU (03/28/89)

If you are serious about coming up with a generic standard for remote
file names (something that could be an RFC, say), then you have to
recognize the fact that file names (aka path names) can be pretty arbitrary.
FTP constrains them not to contain CRLF (I think...), but one could even
imagine a file system that allowed CR, LF, or CRLF in a pathname.  More
importantly, pathnames often depend on login information.  In addition to
Unix-style relative pathnames, remember that many implementations do the
equivalent of a Unix chroot() when you log in as ANONYMOUS, so the directory
tree is substantially different from what a use logging in with some other
user ID would see.

A widely used and fairly robust syntax for specifying remote file names
is:
	<hostname>"<login information>"::<pathname>CRLF

roy@phri.UUCP (Roy Smith) (03/29/89)

jqj@HOGG.CC.UOREGON.EDU writes:
> one could even imagine a file system that allowed CR, LF, or CRLF
> in a pathname.

	You mean like the Unix file system?  To wit:

$ echo xxx* | od -c
0000000    x   x   x  \r  \n   y   y   y  \n
0000011
-- 
Roy Smith, System Administrator
Public Health Research Institute
{allegra,philabs,cmcl2,rutgers,hombre}!phri!roy -or- roy@phri.nyu.edu
"The connector is the network"

bzs@ENCORE.COM (Barry Shein) (03/29/89)

I'm having problems understanding all the baroque (or is that rococo?)
suggestions, let's try starting at the beginning...

A file system syntax (not semantics) is a string describing a point in
a data structure.

Most systems I know of use either arrays (ie. no directories, RT-11),
trees (Unix, MS/DOS) or forests (most DEC OS's, IBM/MVS, VM/CMS is a
forest of arrays I guess, degenerate case.)

I call them forests rather than funny-rooted trees since I don't
believe, given some path like [000000,000000]<foo.bar>stuff.xxx you
can add syntax to walk back up and down a different path a la Unix's
/foo/bar/../../down/down which is equvalent (well, usually, ahem) to
/down/down (ahh, symlinks...but I doubt you'll solve that problem here
nor even suggest trying.)

The question is how much SEMANTICS you want to build into the syntax.

Should the SYNTAX reflect that the first step off the root is a
username? Or is it sufficient to just let the remote O/S worry about
that translation? Thus something like <BZS.STUFF>ABC.DEF can be
represented as /bzs/stuff/abc.def and be easily interpreted.

That's really the major difference between Unix syntax and other OS's
being mentioned, Unix isn't using punctuation characters to indicate
semantics.

Now, I will point out that (having written a program to go between
TOPS-20 and Unix paths) there are potential ambiguities, although they
mostly fall into the "would someone really *do* that?" category, for
example, does:

	/foo/bar/baz.xxx

on a TWENEX system become:

	foo:<bar>baz.xxx

or

	<foo.bar>baz.xxx

?

They can both exist simultaneously and describe different files. The
same can be true for VMS and other OS's that have this sort of thing
for devices and logical names. Pity.

Is it sufficient to suggest that people avoid creating that potential
ambiguity if they want to be accessible (or at least only interpret
paths one way? I could imagine a local system optionally interpreting
/foo:/bar/baz.xxx as a device/logical name and /foo/bar/baz always as
<foo.bar>baz.xxx, not pretty.

I suppose the same can be said for hostnames (eg. /foo/bar, is foo a
local directory or a remote host?) It's been "solved" in every
conceivable way (//foo/bar is remote, /../foo/bar is remote, only
/remote/foo/bar is remote, etc etc.)

Of course, whatever we do won't fit the next thing coming around the
corner (I dunno, free-association concept array-driven file systems
with prescient error recovery.)

Anyhow, I think Keep It Simple might be a good plan for this one,
trying to cover every unusual case usually makes for a bad design
("hard cases make bad law".)

	-Barry Shein, Software Tool & Die

welch@cheops.cis.ohio-state.edu (Arun Welch) (03/29/89)

The Envos Medley lisp environment (originally known as Interlisp-D)
implementation uses a novel approach, namely to treat all filenames at
the user level in the local syntax, namely
{host}<dir>subdir>name.type;version, and provides a translation
function to the remote file name.  There are some obvious problems
with this, in that you now have to provide a conversion function to
every OS out there, but from the users standpoint it's kinda nice,
since they don't have to worry about the remote systems syntax.  Since
dmachines use a variety of other hosts and protocols as file servers,
as far as the user is concerned there's no difference between a
TOPS-20 host and a Unix one, they're all just the same.  For example,
{SERVER}<foo>bar>baz.text could get translated to /foo/bar/baz.text or
to <foo.bar>baz.text, depending on whether SERVER was a Unix or a Tops
host. Things like spaces and other "non-standard" characters simply
have to be escaped.


...arun


----------------------------------------------------------------------------
Arun Welch
Lisp Systems Programmer, Lab for AI Research, Ohio State University
welch@tut.cis.ohio-state.edu

rpw3@amdcad.AMD.COM (Rob Warnock) (03/29/89)

In article <3728@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
+---------------
| > one could even imagine a file system that allowed CR, LF, or CRLF
| > in a pathname.
| 	You mean like the Unix file system?  To wit:
| $ echo xxx* | od -c
| 0000000    x   x   x  \r  \n   y   y   y  \n
+---------------

I even had a use for one of these. When bouncing around a certain LAN I got
so tired of not remembering what a given host's way to clear the screen was
(some systems had a "clear" command, others not) that I put a shell script
in my ~/bin/ on each system to do that. Since my favorite terminal (at the
time) used a form-feed to clear the screen, that's what the name of the
program was: <^L> (the single character 0x0C). Worked fine! (On Unix...)

And a certain computer company whose terminals emitted the triplet <^A>x<CR>
(for various "x") taught its "office automation" users (sec'ys, etc.) to
make "friendly" shell scripts whose names were some function key, so users
could "customize" their terminals with single buttons that, for example,
read mail. To wit:

	% vi <F3>	# note no need to type RETURN after hitting <F3>
	...edit the Function-Key-3 script...
	% chmod a+x <F3>
	% <F3>
	...and the script named "<^A>c<CR>" runs...


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403

7thSon@SLCS.SLB.COM (Chris Garrigues) (03/30/89)

    Date: Tue, 28 Mar 89 22:27:38 -0500
    From: Barry Shein <bzs@encore.com>


    I'm having problems understanding all the baroque (or is that rococo?)
    suggestions, let's try starting at the beginning...

What I'm having trouble understanding is why this is such a major issue
so late in OS/networking development.  I'm typing this from my Lisp
machine and from here can access files on all sorts of different hosts
with no problem.

If I want to grab a file from SRI-NIC, I say
"SRI-NIC.ARPA:<RFC>RFC-INDEX.TXT" and I get it.  If I want to write this
file onto my lisp machine file server, I say
"B:>7thson>text>RFC-index.text" and it writes there.  If I want to write
it only one of our Suns, I say "LINUS:~7thson/rfc-index" and it gets
there.  This is using an operating system which has been around for
quite a while and it's been WORKING!!!

	    foo:<bar>baz.xxx

    or

	    <foo.bar>baz.xxx

    ?

    They can both exist simultaneously and describe different files. The
    same can be true for VMS and other OS's that have this sort of thing
    for devices and logical names. Pity.

I think your problem here is in trying to force pathnames into the
rather limited Unix model.  This also loses (good handling of)
extensions and version numbers.  

    Anyhow, I think Keep It Simple might be a good plan for this one,
    trying to cover every unusual case usually makes for a bad design
    ("hard cases make bad law".)

Of course.


Chris

zweig@p.cs.uiuc.edu (03/31/89)

/* Written 11:18 pm  Mar 28, 1989 by rpw3@amdcad.AMD.COM in p.cs.uiuc.edu:comp.protocols.tcp-ip */

And a certain computer company whose terminals emitted the triplet <^A>x<CR>
(for various "x") taught its "office automation" users (sec'ys, etc.) to
make "friendly" shell scripts whose names were some function key, so users
could "customize" their terminals with single buttons that, for example,
read mail. To wit:

	% vi <F3>	# note no need to type RETURN after hitting <F3>
	...edit the Function-Key-3 script...
	% chmod a+x <F3>
	% <F3>
	...and the script named "<^A>c<CR>" runs...
                          ^^^^^^^^^^^^^^^^


Uh-uh. The script named "<^A>c" runs, since the <CR> tells the shell
about the end of the input line.

RLN101@URIACC.BITNET (Marshall Feldman) (03/31/89)

Sounds much like kermits talking between different systems.  But how
would /usr/foo.bar.unix be translated to a file name on a machine running
CMS, PC-DOS, or any of the other IBM curses?

rpw3@amdcad.AMD.COM (Rob Warnock) (04/01/89)

In article <93400016@p.cs.uiuc.edu> zweig@p.cs.uiuc.edu writes:
+---------------
| /* Written 11:18 pm  Mar 28, 1989 by rpw3@amdcad.AMD.COM
| 	% <F3>
| 	...and the script named "<^A>c<CR>" runs...
|                           ^^^^^^^^^^^^^^^^
| Uh-uh. The script named "<^A>c" runs, since the <CR> tells the shell
| about the end of the input line.
+---------------

Oops! (*blush*) You're right, of course.

("That <CR> just crawled into my hand, honest...")


Rob Warnock
Systems Architecture Consultant

UUCP:	  {amdcad,fortune,sun}!redwood!rpw3
DDD:	  (415)572-2607
USPS:	  627 26th Ave, San Mateo, CA  94403