[net.bugs.4bsd] find

qacr@oce-rd1.UUCP (Alistair Crooks) (08/13/85)

On our Suns, running Sun Unix(TM) 4.2 BSD, Releases 1.2, 1.3, 1.4,
a find(1) will fail when the pathname-list is a symbolic link.

The problem can be repeated by linking symbolically to the /usr/bin
directory fom my home directory, for example, calling it pathname-list,
and executing

	find pathname-list -name find -print

from my home directory.

find does not seem to expand the link, or use readlink(), or anything
else. Is it meant to, or should a find just give no output,
as though there weren't any files?

Current thinking seems to be that if a bug is documented, it is a feature.
I have looked at the manual entry, but can see

	i)  no references to links (symbolic or otherwise) being handled
	    differently to other directory entries

or	ii) any disclaimer in the BUGS section of the manual. 

Stop Press : Sun Release 2 also shows this.
Any comments...

Alistair G. Crooks
BSO Eindhoven/Oce Nederland b.v.
{seismo,philabs,decvax,ucbvax}!mcvax!oce-rd1!qacr
{seismo,philabs,decvax,ucbvax}!mcvax!bsovax!ocealis


-- 
Alistair G. Crooks
BSO Eindhoven/Oce Nederland b.v.
{seismo,philabs,decvax,ucbvax}!mcvax!oce-rd1!qacr
{seismo,philabs,decvax,ucbvax}!mcvax!bsovax!ocealis

chris@umcp-cs.UUCP (Chris Torek) (08/14/85)

Find is designed not to traverse symbolic links, as they often cause
pathname loops.  It is arguably a mistake to skip those that are given
in the pathname-list....
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 4251)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@maryland

guy@sun.uucp (Guy Harris) (08/16/85)

> On our Suns, running Sun Unix(TM) 4.2 BSD, Releases 1.2, 1.3, 1.4,
> a find(1) will fail when the pathname-list is a symbolic link.
> 
> find does not seem to expand the link, or use readlink(), or anything
> else. Is it meant to, or should a find just give no output,
> as though there weren't any files?

The 4.2 manual doesn't come out and say it explicitly, but the 4.2 "find"
treats symbolic links as symbolic links, rather than looking at what they
point to.  The manual does say:

	-type C     True if the type of the file is C, where C is
		    "b", "c", "d", "f" or "l" for block special file,
		    character special file, directory, plain file, or
		    symbolic link.

which does imply that it does an "lstat" rather than a "stat", and looks at
symbolic links rather than at what they point to.  (This is in standard
4.2BSD.)

	Guy Harris

lwa@apollo.uucp (Larry Allen) (08/26/85)

In all of the 4.2bsd implementations I know about (VAX, Sun, Apollo), find(1) is specifically arranged
to not follow symbolic links.  There are a couple of reasons for this:
    1) If you follow a symbolic link, it's hard to get back.  Going back to .. doesn't work; instead,
       find would have to explicitly keep a stack of the directories it had visited.  While this would
       work, it would require find to do a getwd(2) at every directory level, and getwd is pretty slow.
    2) There are problems with loops in the directory structure.  A symbolic link can point to an
       ancestor of the current directory, potentially resulting in infinite loops.  Again this can be
       solved by keeping track of all directories visited so far, but it would be slow, especially on
       big searches.

As an aside, note that this issue of following symbolic links is a "gotcha" in systems which provide both
System 5 and 4.2Bsd compatibility, like Apollo's.  We have added an lstat(2) call to the System 5 library,
and modified programs like find and du which search the directory tree to use lstat and hence to avoid
following symbolic links.
                                                                    -Larry Allen
                                                                     Apollo Computer

peter@graffiti.UUCP (Peter da Silva) (08/31/85)

> In all of the 4.2bsd implementations I know about (VAX, Sun, Apollo), find(1) is specifically arranged
> to not follow symbolic links.  There are a couple of reasons for this:
>     1) If you follow a symbolic link, it's hard to get back.  Going back to .. doesn't work; instead,
>        find would have to explicitly keep a stack of the directories it had visited.  While this would
>        work, it would require find to do a getwd(2) at every directory level, and getwd is pretty slow.

Why would it require find to do a getwd? The file spelling checker in Kernighan
and Pike doesn't. All you have to do is build the directory stack on the fly.
I always assumed find did this rather than depending on ..

hyd-ptd!/usr/src/news and hyd-ptd!/usr/spool/uucp/news.src were the same file
for a while when I was trying to get uucp up on datafact. Find never got lost
here, that I recall.

gwyn@brl-tgr.ARPA (Doug Gwyn <gwyn>) (09/02/85)

> > In all of the 4.2bsd implementations I know about (VAX, Sun, Apollo), find(1) is specifically arranged
> > to not follow symbolic links.  There are a couple of reasons for this:
> >     1) If you follow a symbolic link, it's hard to get back.  Going back to .. doesn't work; instead,
> >        find would have to explicitly keep a stack of the directories it had visited.  While this would
> >        work, it would require find to do a getwd(2) at every directory level, and getwd is pretty slow.
> 
> Why would it require find to do a getwd? The file spelling checker in Kernighan
> and Pike doesn't. All you have to do is build the directory stack on the fly.
> I always assumed find did this rather than depending on ..

Peter is right.  Indeed, most if not all of the utilities in the BRL UNIX
System V emulation for 4.2BSD now avoid doing chdir( ".." ) in order to
avoid problems with symbolic links.  Our SVR2 Bourne shell has been
modified so that "cd .." does what one might expect (trim off rightmost
piece of pathname from current working directory) rather than wander off
in a different direction than was used to enter the directory.  It is
initialized by "cd $HOME" in /usr/5lib/profile (our equivalent of
/etc/profile), to make sure that it thinks you are in the directory as
speicifed in /etc/passwd and not wherever /bin/pwd would say you are.

jpl@allegra.UUCP (John P. Linderman) (10/21/85)

Index: usr.bin/find.c 4.2BSD

Description:
	Only one -newer option will be correctly processed on a find.
Repeat-By:
	# The following script demonstrates the problem (which also
	# exists on System V and Version 8) and the effect of the fix.
	# The fix also adds the ability to compare on access and
	# inode modification times as well as file modification time,
	# as is also demonstrated in the script.
	$ touch 1
	$ touch 2
	$ touch 3
	$ find . \( -newer 2 -o -newer 3 \) -print
	.
	$ /usr/5bin/find . \( -newer 2 -o -newer 3 \) -print
	.
	$ find . \( -newer 3 -o -newer 2 \) -print
	.
	./3
	$ ./find . \( -newer 2 -o -newer 3 \) -print
	.
	./3
	$ mv 1 4
	$ find . -newer 2 -print
	.
	./3
	$ ./find . -newer 2 -print
	.
	./3
	$ ./find . -newerc 2 -print
	.
	./3
	./4
	$
Fix:
	The following diffs to the BSD 4.2 source correct the problem,
	and add a dozen options.  (Only a few options are genuinely
	useful, but it was cleaner to add them all than to prune out
	the useless ones.)  -newer can be followed by one or two
	occurrences of the letters [acm] to specify which time from the
	stat structure (st_atime, st_ctime or st_mtime -- see stat(2))
	will be used in the comparison.  The first letter, if any,
	determines the time used for the files the find command is
	searching.  The second, if any, determines the time from the
	file that follows the -newer option.  Both default to m, so
	-newer foo, -newerm foo, and -newermm foo are identical.  Note
	that -newerc causes the INODE modification time of the found
	files to be compared to the FILE (not inode) modification time
	of the specified target.  This was done deliberately, because
	it works correctly with the following incremental backup scheme

	touch startstamp
	find ... -newerc laststamp ...
	mv startstamp laststamp

	If the dump dies midstream, laststamp is not changed, so the
	next dump will get all the files this dump would have.  If the
	dump does run to completion, the mv changes the inode
	modification time of startstamp but not the file modification
	time, so the next incremental dump will pick up all the files
	changed after OR DURING this dump, including those whose modes
	or owners were changed or those renamed.

	I don't know if the System V and Version 8 sources are
	identical, but (except for the MAXPATHLEN change), the
	changes appear to be analogous.  The new features are
	particularly useful in conjunction with the System V
	touch command, which allows one to set the modification
	dates of a file to an arbitrary time.  These give
	greater precision and cleaner semantics than the -mtime
	and -atime options (one day since when??).

	John P. Linderman  Department of find bug finders  allegra!jpl

11c11
< char	Pathname[200];
---
> char	Pathname[MAXPATHLEN + 1];
30c30,32
< long	Newer;
---
> #define	NNEW	50
> int	Nnewer;
> time_t	Newer[NNEW];
230c232,234
< 	else if(EQ(a, "-newer")) {
---
> 	else if(strncmp(a, "-newer", 6) == 0) {
> 		char *p =        a + 6;
> 		time_t *t1p, *t2p;
235,236c239,278
< 		Newer = Statb.st_mtime;
< 		return mk(newer, (struct anode *)0, (struct anode *)0);
---
> 		if(Nnewer >= NNEW) {
> 			fprintf(stderr, "find: too many -newer constructs\n");
> 			exit(1);
> 		}
> 		t1p = t2p = &(Statb.st_mtime);
> 		switch (*p) {
> 		case 'm':
> 		    p++;
> 		    break;
> 		case '\0':
> 		    break;
> 		case 'a':
> 		    t1p = &(Statb.st_atime);
> 		    p++;
> 		    break;
> 		case 'c':
> 		    t1p = &(Statb.st_ctime);
> 		    p++;
> 		    break;
> 		}
> 		switch (*p) {
> 		case 'm':
> 		    p++;
> 		    break;
> 		case '\0':
> 		    break;
> 		case 'a':
> 		    t2p = &(Statb.st_atime);
> 		    p++;
> 		    break;
> 		case 'c':
> 		    t2p = &(Statb.st_ctime);
> 		    p++;
> 		    break;
> 		}
> 		if (*p == '\0') {
> 		    Newer[Nnewer] = *t2p;
> 		    return mk(newer, (struct anode *)t1p,
> 				     (struct anode *)(&Newer[Nnewer++]));
> 		}
428c470,471
< newer()
---
> newer(p)
> register struct { int f; time_t *t1, *t2; } *p;
430c473
< 	return Statb.st_mtime > Newer;
---
> 	return *(p->t1) > *(p->t2);

mp@allegra.UUCP (Mark Plotnick) (10/21/85)

> From jpl@allegra.UUCP (John P. Linderman)
> The following diffs to the BSD 4.2 source correct the problem,
> and add a dozen options.  (Only a few options are genuinely
> useful, but it was cleaner to add them all than to prune out
> the useless ones.)
Look out, world!  I was just showing Linderman a bug in "ls" this morning...

jpl@allegra.UUCP (John P. Linderman) (10/21/85)

> From: mp@allegra.UUCP (Mark Plotnick)
>> From jpl@allegra.UUCP (John P. Linderman)
>> The following diffs to the BSD 4.2 source correct the problem,
>> and add a dozen options.  (Only a few options are genuinely
>> useful, but it was cleaner to add them all than to prune out
>> the useless ones.)
> Look out, world!  I was just showing Linderman a bug in "ls" this morning...

Hoist with my own petard, eh?  The changes only add two new
``concepts'', the use of access times or inode change times instead of
file modification times.  Then the two files, the backwards-compatible
defaults, and simple combinatorics generate a dozen possibilities.
Those ending with one or more m's are ``useless'' in the sense that the
same effect can be obtained by leaving the m's off, but they are
``useful'' because they provide a consistent mapping to the underlying
concepts.  By this reckoning, adding the n+1'st flag to ls adds 2**n
new options, so a dozen is pretty modest.  But now that you mention it,
it would be nice to have a flag to ls that caused all unprintable
characters in file names to ...

John P. Linderman  Finder of lost options  allegra!jpl