[comp.sys.sun] NFS bug: Sun client 'touch' zeroes mod time on Iris server

slevy@geom.umn.edu@uunet.uu.net (Stuart Levy) (04/24/91)

There seems to be an incompatibility between the SunOS 4.* NFS client and
SGI 3.3 server implementations.  I'd have called it a Sun bug, but it
appears it's really an obscure protocol "feature" which SGI mishandles.

One of our users, trying to touch files to force make to recompile them,
found it ineffective.

Environment:
        SunOS 4.x {4.0.3, 4.1, 4.1.1 at least} NFS client
        SGI {3.3.1 at least} NFS server

Symptom:
        ``/usr/bin/touch file'' (or /usr/5bin/touch file) on the Sun,
        when the file already exists, lives on the SGI NFS server,
        and is owned by the user doing the touch, sets the file's
        modification time to *zero*:

        % /usr/bin/touch /usr8/rats; ls -l /usr8/rats
          -rw-rw-rw-  1 slevy           0 Dec 31  1969 /usr8/rats

        On the other hand, if "file" is local or NFS-mounted from a Sun server,
        touch works as expected -- both access and modification times are
        updated.  Ditto if ``touch'' is run on an SGI client.

        Meanwhile, Sun /usr/bin/touch or /usr/5bin/touch with explicit
        arguments (-m, etc.) does the Right Thing.

Diagnosis:
        SunOS utimes(), when called with a NULL array of struct timeval's,
        emits an NFS setattr packet with 64-bit access-time (seconds &
        microseconds) correctly set to the present moment;
        the seconds portion of modification-time is also current,
        but mod-time microseconds = 1000000 exactly.

	Sun's NFS code actually does this intentionally, it turns out.
	Smart servers are expected to take tv_usec=1000000 as a clue
	that the time should be set to (the server's idea of?) the present,
	while less smart ones would simply use the whole-second
	part of the time.  SGI does neither, somehow.

This note is mostly a bug report -- if you-all have been finding file mod
times set to 1969, this is probably why -- but if anyone knows of a
workaround I'd be glad to hear of it.

    Stuart Levy, Geometry Group, University of Minnesota
    slevy@geom.umn.edu, (612) 624-1867

thurlow@convex.convex.com (Robert Thurlow) (04/29/91)

In <2598@brchh104.bnr.ca> slevy@geom.umn.edu@uunet.uu.net (Stuart Levy) writes:

>Diagnosis:
>        SunOS utimes(), when called with a NULL array of struct timeval's,
>        emits an NFS setattr packet with 64-bit access-time (seconds &
>        microseconds) correctly set to the present moment;
>        the seconds portion of modification-time is also current,
>        but mod-time microseconds = 1000000 exactly.

>	Sun's NFS code actually does this intentionally, it turns out.
>	Smart servers are expected to take tv_usec=1000000 as a clue
>	that the time should be set to (the server's idea of?) the present,
>	while less smart ones would simply use the whole-second
>	part of the time.  SGI does neither, somehow.

The rocks should be thrown at Sun, because the bug is in their NFSSRC 4.0
and ONC/SRC 4.1 source releases to OEMs.  We've just fixed it in our
almost-shipping ConvexOS V9.1 release.  You're right that the client and
server have an agreement that 1,000,000 microseconds is a special 'magic'
flag in the protocol, but Sun didn't finish the thought; there's no other
code to pick up on that at the server end!  The code deliberately hammers
'0' into the mod time, as you've observed.  I think it'll do this on the
right release of Sun servers as well.  It looks like sloppy source control
at Sun.  SGI, like us, got burned because we ported the code a little too
fast :-)  Our fix was to comment out the code in the server that detected
the magic number entirely.

BTW, the best newsgroup to discuss this is comp.protocols.nfs - all 
followups should be to that forum.

Rob Thurlow, thurlow@convex.com