[net.unix-wizards] RCP

ron@BRL.ARPA (Ron Natalie) (03/09/86)

Of course, you'd like to tell us the sure fire way of determining if
the destination machine is the same as the source machine?  There is
no easy way.

-Ron

wayne@ames-nas.arpa (Wayne Hathaway) (03/12/86)

Regarding the BSD "feature" of trashing files if you happen
to forget which host you are on and do an rcp of a file onto
itself:

We (Sterling Software, formerly Informatics) are developing
networking software for NASA Ames based on 4.2BSD, and we
recognized very early that users would be most unhappy with
the virgin BSD behavior.  Our first "fix" was to compare the
local hostname (from gethostname()) against the colon name
and all its aliases (from gethostbyname()).  For singly
homed hosts, this is reasonably effective.

The better fix, which is currently being installed, relates
to the uid/gid mapping capability we implemented as part of
our Newcastle Connection-based distributed file system
(described in an earlier mail to this group).  The problem
was that uid/gid mappings should be based on MACHINE, rather
than PATH (which unfortunately is what Internet names and
addresses refer to).  That is, it shouldn't matter that a
connection came over the HYPERchannel, the Ethernet, or
through the IMP -- what counts is the source machine itself.
To facilitate this type of mapping, we (re)introduced the
concept of machine-id, or mid.

Now, with every host in our network assigned a unique mid,
it is an easy matter to fetch the mids of both files, and
thus protect against the BSD behavior.

Also note that this fix is needed for cp itself in a
distributed file system environment, particularly since we
allow either hostname or alias to be used in remote
pathnames (which all start with /r/).  Thus our version of
Newcastle handles things like

    cp /r/cramden/etc/hosts /r/ralphie-boy/etc/hosts

where "ralphie-boy" is an alias for "cramden."  This was a
trivial change to mv.c to check for identical mids as well
as identical device and inode numbers, using a new pseudo-
syscall "statmid" which returns a file's mid ("pseudo-
syscall" because it is done in the C library; our
distributed file system requires no kernel changes).

And a side comment/question:  The "hostid" concept of 4.2BSD
would seem to have been a start at "machine-ids", but then
they went and equated it with Internet address (which of
course completely defeats the benefits).  Anybody know what
Berkeley had in mind for "hostid"?


Wayne Hathaway    wayne@ames-nas.arpa
Sterling Software/Informatics


PS:  Or is that "kramden"?

cottrell@nbs-vms.arpa (COTTRELL, JAMES) (03/12/86)

/*
> Regarding the BSD "feature" of trashing files if you happen
> to forget which host you are on and do an rcp of a file onto
> itself:
> 
	[mapping explained]

> To facilitate this type of mapping, we (re)introduced the
> concept of machine-id, or mid.
> 
> Now, with every host in our network assigned a unique mid,
> it is an easy matter to fetch the mids of both files, and
> thus protect against the BSD behavior.
> 
> And a side comment/question:  The "hostid" concept of 4.2BSD
> would seem to have been a start at "machine-ids", but then
> they went and equated it with Internet address (which of
> course completely defeats the benefits).  Anybody know what
> Berkeley had in mind for "hostid"?

I jumped to respond at this a bit too quickly I suppose.
The `internet address' should be unique unless you are a gateway
or connected to more than one network. In any case, picking the
`dominant' network and sing its internet address as `hostid'
should work. 
 
> PS:  Or is that "kramden"?

I don't know. But at least I can spell `Norton'.

	jim		cottrell@nbs
*/
------

wayne@ames-nas.arpa (Wayne Hathaway) (03/13/86)

Jim Cottrell comments on my description of our use of machine-id:

>I jumped to respond at this a bit too quickly I suppose.
>The `internet address' should be unique unless you are a gateway
>or connected to more than one network. In any case, picking the
>`dominant' network and using its internet address as `hostid'
>should work. 

We (or rather NASA Ames) is definitely a "more than one network"
situation, having essentially two backbones (a HYPERchannel and
an Ethernet).  Most hosts are (will be) on both, but a few are
on the Ethernet only and one (the Cray 2) is on the HYPERchannel
only.  They also have at least two hosts connected to an IMP.

Therefore even defining the "dominant network" becomes a problem,
especially when you consider that they really do want to be able
to (for example) use the Ethernet as a backup for the HYPERchannel
(and vice versa).  Of course, since there are only two networks
we could have just pretended each machine was really two different
machines and duplicated the mapping tables (for example, each host
would have two inward mapping tables, one from Ethernet host
128.102.4.14 and one (identical one) from HYPERchannel host
192.12.102.14, even though these two hosts are in fact the same
machine).  This would seem to have definite disadvantages in terms
of table space and maintenance, however.  For example, they just
recently changed the Ethernet from a Class C network to a portion
(to be a subnet) of an Ames-wide Class B network.  Using Internet
address instead of mid would have required changes to every hosts'
mapping tables.  As it was, the names did not change so the
name-to-mid mappings did not change; the only updating required
was to the various /etc/hosts files.  (And of course this could
have been avoided by the use of nameservers, but ...)

In addition, when you consider our design as a "total system,"
there are other arguments for mid.  For example, we have
implemented a batch job entry system (particularly for the Cray).
Unless otherwise directed, this system returns output "to the
originating machine."  But if you happened to submit your job
over the Ethernet, you would probably be upset if the system
refused to return your output simply because the Ethernet was
down, even though there was an alternate path available.  Again,
there are solutions which do not require mid; we just felt that
biting the bullet once and providing the capability to uniquely
identify each machine with a single number would be a winner.

But as they say, only time will tell.


Wayne Hathaway			wayne@ames-nas.arpa
Sterling Software/Informatics

chris@umcp-cs.UUCP (Chris Torek) (03/13/86)

The 4.2 hostid works perfectly well as a machine ID.  You can set it
arbitrarily; using it as an Internet ID is only a convenience.  We
were running 4.2 for about a year with a hostid of 0 on each machine....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu