[comp.protocols.nfs] NFS protocol changes after SunOS 4.0

guy@auspex.auspex.com (Guy Harris) (08/14/90)

>Does anyone know if Sun changed the NFS protocol between SunOS
>4.0 and subsequent releases? Specifically, was the uid and the
>gid sent across the network, from client to server, when a
>'chown' or 'chgrp' was performed on a file.

Well, that depends on what the "chown" or "chgrp" command does.  The
underlying call is "chown()", which takes both an owner and a group
argument.  In SunOS, if the owner argument is -1, the owner isn't
changed, and if the group argument is -1, the group isn't changed; I
think this was introduced in 4.2BSD, and so this should be true of all
SunOS releases.  The special handling of -1 isn't in S5 prior to S5R4
(it is in S5R4).

In the NFSSRC4.0 code, which I think is derived from 4.0, the owner and
group arguments to "chown" are passed down through the VFS layer to the
individual file system.  The NFSSRC4.0 NFS code just passes the owner
and group fields down, which means it'll pass the -1 over the wire,
assuming the -1 doesn't get mutilated in transit.

In 4.0, the owner and group arguments to "chown" are passed down to the
file system in a "vattr" structure; the fields of that structure have
types "uid_t" and "gid_t", respectively, and those are "short".  -1 will
continue to be -1 after being shoved into those fields.  They then get
converted to "int"s in the NFS code before being sent over the wire; -1
will continue to be -1 after being widened.

>And now ( > SunOS 4.0.3), is only the uid sent on a chown with the gid
>set to -1, and only the gid sent on a chgrp with the uid set to a -1 (65535
>or 0xffff).

"uid_t" and "gid_t" became "unsigned short" in 4.1, so that SunOS will
match other OSes such as S5 and BSD.  It is possible that in the process
of doing this, something got broken, such that handing -1 to "chown"
causes it to get stuffed into the now-"unsigned short" fields of the
"vattr" structure, converting it to 65535, and that when this is
converted into the "int"s to be sent out over the wire, they get turned
into 65535, not -1.  If so, this is a bug.

>This get more interesting ... we've set up a matrix of several
>different vendor's UNIX systems where each system is a client of
>each other system, and each system is also is a server for every
>other system ... ( save the Pyramid) ... As follows

Well, the PC is running an S5R3 system, and you say the Tandem is doing
so as well.  As such, I don't expect programs on either of those systems
ever to pass -1 down to "chown()" as an argument, as, in S5R3, that's
not specified to mean "don't change the ID in question".  Given that,
it's not surprising that nothing causes the UID or GID to become 0xFFFF.

If 4.1 does that, it's probably the bug I described above.  Are you
*certain*, however, that 4.0.3 does that?  4.0.3's UID and GID fields in
a "vattr" are signed, not unsigned.  It's also surprising that only the
Tandem sets the UID or GID to 0xFFFF.

>Now as I recall, somewhere around SunOS 4.0.x, the "nobody" uid
>and gid was changed to -2 (65534) -- there was a big deal made
>of it in one of the quad-zillion "Read Me First" TN's that came
>with our systems.

Err, prior to 4.1, SunOS used -2 as "nobody".  It changed *from* -2 *to*
65534 in 4.1, because SunOS got with the program and made the user ID
types in "stat" structures and the like unsigned.

>Could it be that Sun decided to use -- in a more "secure" NFS protocol
>-- a -1 to indicate an unsent or unspecified, i.e., secure uid/gid and
>that that's why the "nobody" uid/gid had to change?

No.  1) The protocol isn't more "secure" as a result of a change such as
that.  2) The protocol always specified that an attribute of -1 means
"don't change this attribute." 3) "nobody" used to be -2, and is now
65534, neither of which collide with -1 nor with 65535.

>Well. That's exactly what we believe is happening. In fact we
>used etherfind and an interesting uid/gid (-7 or 65529) to test
>our theory and indeed the Tandem always sent both the uid and
>gid on either a chown or chgrp.

Not surprising, since S5R3 doesn't have a version of "chown()" that lets
you change only the owner or only the group (no, don't tell me about the
"chown" or "chgrp" *command*, I'm talking about the "chown()"
*procedure*).  As such, the "chown" and "chgrp" commands on the Tandem,
if it's running an S5R3-derived system and hasn't picked up that
particular BSDism, are obliged to set both the owner and group when they
call "chown()", although one of them would presumably be "changed" to
the value it already has. 

>The Sun's only sent both when both were changed (i.e., chown 65529.65529
>testfile). When only the uid was changed, only the uid was sent followed
>by 0xffff or -1 were the gid was in previous packets. The same occurred for
>chgrp's on the test file, a -1 appeared where there the uid
>would have been.

Again, not surprising.  The BSD-derived "chown" and "chgrp" commands in
SunOS *do* rely on the "-1 as an argument to 'chown()' doesn't change
anything" behavior, so a "chown" without a group name after the ".", or
a "chgrp", will just pass -1 down.

Now, the question is whether 0xffff appears as the UID or the GID in the
packets, or -1?  Since the UID and GID in an NFS call are *always* 32
bits (integral quantities are always sent over the wire as 32-bit
quantities in RPC, since they're always encoded as 32-bit quantities in
XDR), -1 and 0xffff are *not* the same.

If, when only the UID is changed on a Sun client (e.g., "chown 65529
testfile"), the GID goes over the wire as -1 (i.e., 0xFFFFFFFF), that's
correct.  If it goes over the wire as 0xFFFF - i.e., 65535 - that's an
error on the part of SunOS.  If so, I could imagine that bug being in
4.1 (as a result of the change to unsigned UIDs and GIDs), but I'd be
quite surprised to see it in 4.0.3.  The same applies to changing only
the GID (e.g., "chgrp 65529 testfile").

Now, it is possible that some servers may - arguably incorrectly - treat
a UID or GID that comes over the wire as 0xFFFF (i.e., 65535) as if it
were -1; the NFSSRC4.0 code takes the 32-bit UID that came over the wire
and just jams it into a (normally 16-bit) field in a "vattr" structure. 
If that field is considered signed, jamming 0xFFFF into it gives you -1,
as does stuffing 0xFFFFFFFF (i.e., a real 32-bit -1).  Other servers may
not do so; it may be that the PC, Pyramid, and Sun servers are doing
this, while the Tandem server isn't.

In effect, you have what might be considered a server bug masking the
effects of the client bug.