dls@mentor.cc.purdue.edu (David L. Stevens) (08/08/89)
Index: /usr/src/usr.bin/find/find.c 4.3BSD Description: find(1) changes atimes for all directories it searches, which makes the "-atime" predicate less useful. The find(1) itself changes the directory access times so they never age. Repeat-By: find / -atime ... Fix: Save the utimes for directories before reading them and restore them on the way back up. (Works for root or directory owners, only) Diffs follow: *** NEW find.c Mon Aug 7 22:33:48 1989 --- OLD find.c Mon Aug 7 22:30:29 1989 *************** *** 6,11 **** --- 6,14 ---- #include <sys/param.h> #include <sys/dir.h> #include <sys/stat.h> + #ifdef STEALTH + #include <sys/time.h> + #endif /* STEALTH */ #define A_DAY 86400L /* a day full of seconds */ #define EQ(x, y) (strcmp(x, y)==0) *************** *** 665,670 **** --- 668,676 ---- char *endofname; auto char sbkeep_dir[MAXPATHLEN+MAXNAMLEN+2]; struct stat lstatb; + #ifdef STEALTH + struct timeval tvp[2]; + #endif /* STEALTH */ if ((follow?stat(fname, &Statb):lstat(fname, &Statb))<0) { fprintf(stderr, "find: bad status < %s >\n", name); *************** *** 699,704 **** --- 705,714 ---- if (chdir(fname) == -1) return(0); + #ifdef STEALTH + tvp[0].tv_sec = Statb.st_atime; + tvp[1].tv_sec = Statb.st_mtime; + #endif /* STEALTH */ if ((dir = opendir(".")) == NULL) { fprintf(stderr, "find: cannot open < %s >\n", name); rv = 0; *************** *** 725,730 **** --- 735,743 ---- ret: if(dir) closedir(dir); + #ifdef STEALTH + (void) utimes(".", tvp); + #endif /* STEALTH */ if (chdir(*sbkeep_dir ? sbkeep_dir : "..") == -1) { *endofname = '\0'; fprintf(stderr, "find: bad directory <%s>\n", name); -- +-DLS (dls@mentor.cc.purdue.edu)
jeffc@soba.osf.org (Jeff Carter) (08/08/89)
In article <3584@mentor.cc.purdue.edu> dls@mentor.cc.purdue.edu (David L. Stevens) writes: >Index: /usr/src/usr.bin/find/find.c 4.3BSD > >Description: > find(1) changes atimes for all directories it searches, which makes >the "-atime" predicate less useful. The find(1) itself changes the directory >access times so they never age. > >Repeat-By: > find / -atime ... > >Fix: > Save the utimes for directories before reading them and restore them >on the way back up. (Works for root or directory owners, only) Diffs follow: Are you sure this is really (A) a bug? (B) a fix? Consulting my 4.3BSD manual, for stat(2): st_atime: Time when _file_ _data_ was last read or modified. Changed by the following system calls: mknod(2), utimes(2), read(2), and write(2). For reasons of efficiency, st_atime is _not_ set when a directory is searched, although this would be more logical. [emphasis mine] And running a quickie experiment on ULTRIX 3.0: (a berkeley derivative) % date Tue Aug 8 09:39:44 EDT 1989 % ls -ldg TRC drwxr-x--- 6 jeffc osf 512 Apr 25 11:15 TRC/ [modify time] % ls -ldug TRC drwxr-x--- 6 jeffc osf 512 Mar 11 14:15 TRC/ [access time] % find ./TRC -print [much output deleted] % ls -ldg TRC drwxr-x--- 6 jeffc osf 512 Apr 25 11:15 TRC/ [modify time] % ls -ldgu TRC drwxr-x--- 6 jeffc osf 512 Mar 11 14:15 TRC/ [access time] % find ./TRC -atime -2 -print [no output. i.e., no file/directory under ./TRC accessed in less than 2 days] Access time did not change. Additionally, the "fix" has the following effect: 4.3BSD utimes(2): The utimes call uses the "accessed" and "updated" [ should be "modified" ] times in that order from the tvp vector to set the corresponding recorded times for _file_. The caller must be the owner of the file or the super-user. The "inode-changed" time of the the file is set to the current time. The effect of this is to change st_ctime on every directory that you do this to. Why is this bad? (OK, suboptimal) because dump(8) uses st_ctime as one of the criteria for whether or not an inode should be dumped. This will make dump run slower and take more tape. There may be other side-effects that I am not aware of. Jeff Carter
dls@mentor.cc.purdue.edu (David L Stevens) (08/08/89)
ARRRRRRRRRRRRRRRRRRRGH. I tested the stock 4.3 find(1) and it does not update atimes. Apparently a local change has this side effect and has allowed me to gracefully insert my foot in my mouth. My apologies to all and thanks to Jeff Carter for not believing everything he reads. -- +-DLS (dls@mentor.cc.purdue.edu)
dupuy@cs.columbia.edu (Alexander Dupuy) (08/08/89)
The only problem with your fix is that by resetting the atime of the directory to the old time, you also set the ctime of the directory to be the current time. While this is not always a problem, in some circumstatnces, you may be more concerned with preserving the ctimes (e.g. for incremental backup purposes) than you are with preserving the atime. You could make your stealth code conditional on some sort of option flag for find, but some will certainly argue that find already has to many options, and that the subtleties of ctime/[am]time interactions are a bit too much for most users of find to grasp. For reference, the rules are: Files: atime: updated when created, read(). mtime: updated when created, write(), truncate(). ctime: updated when created, write(), truncate(), chown/chmod(), utime/utimes(), link/unlink/rename() of self. Directories atime: updated when created, read(), getdents/getdirentries(). mtime: updated when created, link/unlink/rename/rmdir() of entries. ctime: updated when created, link/unlink/rename/rmdir() of entries, chown/chmod(), utime/utimes(), link/unlink/rename() of self. In general, the atime is updated whenever the data in a file is read, the mtime is updated whenever the data in a file is modified, and the ctime is updated whenever the data associated with the inode is changed. @alex -- -- inet: dupuy@cs.columbia.edu uucp: ...!rutgers!cs.columbia.edu!dupuy
dls@mentor.cc.purdue.edu (David L Stevens) (08/09/89)
For what it's worth, I have further information and I'm removing my foot from my mouth and replacing it for the premature retraction.... The "local changes" that caused find(1) to suddenly start changing the atimes on directories were in fact the Tahoe changes and not something we did. It was in fact the 4.3 version, not the Tahoe version, that I tested and that did not have the problem. The STEALTH code avoids that and allows, for example, /tmp to be cleared based on access times, without leaving a tree of empty directories for some other cleanup method. Some have suggested that it be a command line option. I don't have a good feel for the dump/ctime argument; all of the directories generally aren't much compared to all of the files, anyway. At any rate, the code is there for you to use or not. :-) Another find(1) question that we're addressing locally is the unintuitive meaning of numbers in the comparisons. As it is, there are three forms ("n", "+n" and "-n"). However, fractions are completely truncated so to match a "-mtime +1" requires a file to actually be *two* days or older. A file that's anywhere from 1 day and 1 second to 1 day, twenty three hours 59 minutes and 59 seconds old are all considered to be one day old and fail the "greater than a day" test. I propose: 1) To match "+n", a file need be n days + 1 second or older. (current: n days + 24 hours) 2) to match "-n", a file need be n days - 1 second or younger. (current: same) 3) to match "n", a file should be +/- a reasonable epsilon. (current: n+1 sec to n+23 hours 59 mins 59 secs) I suggest an hour, so files 23.00.01-25.00.59 would be considered an "exact" day-old match, but a file that's 1 day, 22 hours old would not. Could also use epsilon in 1) and 2) to maintain a dichotomy. The most obtuse example is a file that's 1 second short of two days old and won't match on "+1", even though the file is in fact 1.99998 days old. Most people'd call that 2, but anyone'd call it >1. -- +-DLS (dls@mentor.cc.purdue.edu)
jgreely@oz.cis.ohio-state.edu (J Greely) (08/09/89)
In article <3608@mentor.cc.purdue.edu> dls@mentor.cc.purdue.edu (David L Stevens) writes: > Another find(1) question that we're addressing locally is the >unintuitive meaning of numbers in the comparisons. As it is, there are >three forms ("n", "+n" and "-n"). What's unintuitive? If I say "-mtime +1", I certainly hope it is interpreted as "two or more days". I never had a problem with the current style: 0 is the past twenty-four hours, +0 is anything before that, and -1 is 0. As long as the units are full days, this behavior is correct. If you want a different behavior, extend the syntax to floating point (+1.0 is what you seem to be asking for). In the long run, you'll probably find that more useful. Of course, if you really want to have fun, check out the paper in the Summer USENIX proceedings (whose title isn't handy at the moment, unfortunately), detailing the implementation of a portable file system tree walking library. Their find replacement, "tw", has a very handy awk-like language embedded into it, allowing truly fun things to be done to files. [suggested changes] > 3) to match "n", a file should be +/- a reasonable epsilon. > (current: n+1 sec to n+23 hours 59 mins 59 secs) > I suggest an hour, so files 23.00.01-25.00.59 would be > considered an "exact" day-old match, but a file that's > 1 day, 22 hours old would not. Ugh. You're better off going to floating point. Magical fudging like this from something that claims to work in "days" could make for confusing results. > The most obtuse example is a file that's 1 second short of two days >old and won't match on "+1", even though the file is in fact 1.99998 days old. Most people'd call that 2, but anyone'd call it >1. But a computer will insist that, by the supplied definition of "day", that file is one day old. It ain't 0 days old, it ain't 2 days or older, so it's 1. -=- J Greely (jgreely@cis.ohio-state.edu; osu-cis!jgreely)
dls@mentor.cc.purdue.edu (David L Stevens) (08/09/89)
Just when you thought it was over... The 4.3 version and the 4.3 Tahoe version and, according to Keith Bostic, a 1982 version, of find(1) all result in changed access times on directories. I have no idea why Ultrix 3.0's version apparently does not and I have no idea why my test of the 4.3 version the first time reproduced Jeff Carter's results (ie, no atime changes). Bostic suggests it might be a kernel bug, but it isn't reliable, whatever the cause. This means if you can live with dumping all your directories (NOT all your files, just all directories), the STEALTH code I posted can be applied to any version of find(1) back to 1982 with some effect. Unless you're running Ultrix 3.0, apparently, though I haven't confirmed that. Note, too, that because find(1) does it's work on the way down, without some change to find(1), you can't get the functionality of "remove all files that haven't been accessed by a user in the last day". Find(1) will insure that the directories are always accessed and further, the directories won't be empty on the way down (only on the way up...). Perhaps find(1) in its present form isn't the best solution to this problem. -- +-DLS (dls@mentor.cc.purdue.edu)
bart@videovax.tv.Tek.com (Bart Massey) (08/10/89)
In article <3608@mentor.cc.purdue.edu> dls@mentor.cc.purdue.edu (David L Stevens) writes: > > The "local changes" that caused find(1) to suddenly start changing > the atimes on directories were in fact the Tahoe changes and not something > we did. It was in fact the 4.3 version, not the Tahoe version, that I tested > and that did not have the problem. We're running 4.3 "tahoe" on a VAX750, and our man page still says st_atime Time when file data was last accessed. Changed by the following system calls: mknod(2), utimes(2), and read(2). For reasons of effi- ciency, st_atime is not set when a directory is searched, although this would be more logical. An experiment convinced me that either the manpage or the kernel is wrong. As the manpage says, it was mainly an efficiency win to not do this before, so maybe the kernel behavior was deliberately changed and not documented. Or maybe it's a kernel bug. Could somebody at Berkeley clarify this? Bart Massey Tektronix, Inc. TV Systems Engineering M.S. 58-639 P.O. Box 500 Beaverton, OR 97077 (503) 627-5320 ..tektronix!videovax.tv.tek.com!bart
dls@mentor.cc.purdue.edu (David L Stevens) (08/10/89)
I believe the man page is referring to namei() (kernel) directory searches, and not read(2) (readdir()) "searches." The following does not change the atime on "hose", even though the kernel searched it to find "bag": cat /tmp/hose/bag -- +-DLS (dls@mentor.cc.purdue.edu)
peter@ficc.uu.net (Peter da Silva) (08/24/89)
In article <5521@videovax.tv.Tek.com>, bart@videovax.tv.Tek.com (Bart Massey) writes: > st_atime Time when file data was last accessed. Changed > by the following system calls: mknod(2), > utimes(2), and read(2). For reasons of effi- > ciency, st_atime is not set when a directory is > searched, although this would be more logical. This means: % cat /usr/fred/project/wheaties/raisins ^^^^^^^-- This file is read. ^^^^^^^^^^^^^^^^^^^^^^^^^^-- These directories are *searched*. for reasons of efficiency, atime is not modified. % ls /usr/fred/project/wheaties ^^^^^^^^^^^^^^^^^------------ These directories are searched. ^^^^^^^^--- This directory is *read*. That is, it is opened and the read(2) sys call is performed (maybe multiple times). This is of course hidden in the directory access routines. A directory being searched has a specific meaning in UNIX: it's what namei does to resolve a path. Find actually opens and reads the directory. -- Peter da Silva, *NIX support guy @ Ferranti International Controls Corporation. Biz: peter@ficc.uu.net, +1 713 274 5180. Fun: peter@sugar.hackercorp.com. `-_-' "export ENV='${Envfile[(_$-=1)+(_=0)-(_$-!=_${-%%*i*})]}'" -- Tom Neff 'U` "I didn't know that ksh had a built-in APL interpreter!" -- Steve J. Friedl