[comp.unix.wizards] Should find traverse symbolic links?

rickert@mp.cs.niu.edu (Neil Rickert) (02/25/91)

In article <1991Feb25.130613.2553@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
>	I was surprised to observe today that if you do "find dir ..." and
>dir is a symbolic link to a directory, the directory isn't entered.  Thus:
>alanine> 
>
>	This seems to me to be The Wrong Thing.  Is it a bug in find, or
>was it really intended to work that way?  I get the same results on MtXinu

 Now just imagine I run a script overnight to remove stale core files.

 Naturally I use something like:

       find . -atime +7 -name core -exec rm \{\} \;

 But, unbeknowns to me, some user has done the following:

	ln -s / root

 If find followed symbolic links, how long do you think it would take this
script to complete its execution?

-- 
=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=*=
  Neil W. Rickert, Computer Science               <rickert@cs.niu.edu>
  Northern Illinois Univ.
  DeKalb, IL 60115                                   +1-815-753-6940

gwyn@smoke.brl.mil (Doug Gwyn) (02/26/91)

In article <1991Feb25.130613.2553@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
>	I was surprised to observe today that if you do "find dir ..." and
>dir is a symbolic link to a directory, the directory isn't entered.

The fundamental problem is that there is no single "right" method of
handling symbolic links.  Sometimes one wants them to be truly
transparent, and other times one wants to notice that they are symlinks.

I could tell you stories about my attempts to decide upon appropriate
default behavior for this in utilities such as "find" that I adapted
to work in environments supporting symlinks, but there isn't much
point to doing so.  The bottom line is that symlinks don't fit very
well into UNIX's idea of hierarchical filesystem structure, and older
utilities were not designed to provide reasonable options for coping
with them.

grr@cbmvax.commodore.com (George Robbins) (02/26/91)

In article <1991Feb25.130613.2553@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
> 
> 	I was surprised to observe today that if you do "find dir ..." and
> dir is a symbolic link to a directory, the directory isn't entered.  Thus:
> 
> 	This seems to me to be The Wrong Thing.  Is it a bug in find, or
> was it really intended to work that way?

It seems questionable but I'd think there'd be a lot more problems if
if did follow links - symlink loops lurk everywhere, not to mention
links in user modifiable directories pointing into system stuff to
trap the unwary root.

For the initial case of the argument being a symlink, you can probably
use the "find /link/" trailing slash kludge, as long as you expect it's
going to be a link to a directory or a directory..

-- 
George Robbins - now working for,     uucp:   {uunet|pyramid|rutgers}!cbmvax!grr
but no way officially representing:   domain: grr@cbmvax.commodore.com
Commodore, Engineering Department     phone:  215-431-9349 (only by moonlite)

bvs@light.uucp (Bakul Shah) (02/28/91)

In article <15319@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>In article <1991Feb25.130613.2553@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
>>	I was surprised to observe today that if you do "find dir ..." and
>>dir is a symbolic link to a directory, the directory isn't entered.
>
>The fundamental problem is that there is no single "right" method of
>handling symbolic links.  Sometimes one wants them to be truly
>transparent, and other times one wants to notice that they are symlinks.

Indeed.  It would be nice if all tree traversal programs used a
common set of options for tree related choices such as crossing
mount points, following sym-links, descending trees, etc.  (with
appropriate defaults).  Not likely to happen for most existing
programs due to compatibility reasons but how about defining such
a set for new programs and extending old ones (if there is no
conflict)?

Suggestion:

	-x      (do/don't) cross mount points
	-L      (do/don't) follow  symLinks
	-R      (do/don't) descend (Recurse) trees

I'd also like a similar set for turning *on* options so that
regardless of defaults one can specify an exact behavior.  If
people don't like options that start with a +, how about -<option>
to turn *on* <option> and -/<option> to turn it *off*?

No illusions though.

-- Bakul Shah
   bvs@BitBlocks.COM
   ..!{ames,att,decwrl,pyramid,sun,uunet}!amdcad!light!bvs

tchrist@cthulhu.convex.COM (Tom Christiansen) (02/28/91)

From the keyboard of bvs@light.UUCP (Bakul Shah):

:I'd also like a similar set for turning *on* options so that
:regardless of defaults one can specify an exact behavior.  If
:people don't like options that start with a +, how about -<option>
:to turn *on* <option> and -/<option> to turn it *off*?

Ug.  Just use -x and +x.  

--tom
--
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things." -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist

rbj@uunet.UU.NET (Root Boy Jim) (02/28/91)

In article <15319@smoke.brl.mil> gwyn@smoke.brl.mil (Doug Gwyn) writes:
>
>The fundamental problem is that there is no single "right" method of
>handling symbolic links.  Sometimes one wants them to be truly
>transparent, and other times one wants to notice that they are symlinks.

Exactly. However, most programs fall into the former category.
When they reference a file or directory, all they want is it's contents.

Others are concerned with the filesystem structure itself. They
often treat file specifications as abbreviations for the entire
subtree starting at that node.

There are occasions where one might want to follow links anyway,
such as when making tar tapes to pick up everything logically
below a given directory, or for transport to a machine where
symlinks are not supported. In this case, a special option
(such as -follow or -h) should be provided.

BUT ANYTHING THAT RECURSES SHOULD NOT FOLLOW SYMLINKS BY DEFAULT! PERIOD.

>I could tell you stories about my attempts to decide upon appropriate
>default behavior for this in utilities such as "find" that I adapted
>to work in environments supporting symlinks, but there isn't much
>point to doing so. 

I wish you would. I think the distinction is quite clear.
It's in capital letters eight lines above.

>The bottom line is that symlinks don't fit very
>well into UNIX's idea of hierarchical filesystem structure, and older
>utilities were not designed to provide reasonable options for coping
>with them.

Neither do mount points. They make hard links harder to do.
-- 
		[rbj@uunet 1] stty sane
		unknown mode: sane

bob@mitisft.Convergent.COM (Bob Lee) (03/01/91)

The problem comes in the basic concept of the Unix file system.
That file system pre-supposes a straight b-tree; however, when you
add unrestricted directory level links or multiple mount points that
assumption is no longer true.  The abstract then becomes a network
and not a b-tree (the b-tree being a degenerate case of the network).
In that environment recursive operation must take great care with
any link or mount point that has the potential to create a loop in
the "tree".  The exclusion of symbolic links and mount points should
be the default action in my opinion, but the means should be there
to include these in say a "find" or "ls" operation.

I worked on a system that allowed multiple mount points etc., and it
can be managed.  However, the user must be aware of this subtle difference
or be guarded by the default but not excluded from making use of it.
I would maintain that the moment you include any primitive that allows
the creation of a loop in the file system you no longer have a real
traditional Unix file system and as such traditional methods will break
down at some point.  The solution, I believe, is education (change the
way we tell people about the file system, no more pure b-tree drawings),
documentation (call a spade a spade, it will no longer be a b-tree but
a super-set), and protection (proper choice of defaults in utilities
the recursively desend (or should the term now be navigate) the file
system).

-bob

all the standard disclaimers apply, and sorry if I rambled a bit

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/02/91)

Side issue: Traversing the entire file system makes sense, but it really
doesn't work for many applications. Any program that returns a list of
all objects satisfying property P, one at a time, had better make two
things true: 1. If an object is returned in the list, it must have had
property P sometime while the program was running. 2. There must be some
period of time so that (a) the program is not running except during that
period (and maybe less); (b) if any object has property P throughout
that period, then it is returned in the list.

find guarantees #1 but not #2. If someone moves a directory while you're
removing all core files or recording all setuid scripts or whatever, you
may miss some files.

(Note that most versions of readdir() don't guarantee #2, and some don't
even guarantee #1. This unreliability is a royal pain.)

It would be much better to have openinodes() and readinodes() calls to
traverse the set of inodes in a given filesystem, one by one. Then both
#1 and #2 would be true, and a lot of file-tree walk code could be done
away with. This could even be made available to normal users who want to
walk through all the files they own.

---Dan

rbp@investor.pgh.pa.us (Bob Peirce #305) (03/08/91)

In article <1991Feb25.143543.4213@mp.cs.niu.edu> rickert@mp.cs.niu.edu (Neil Rickert) writes:
>In article <1991Feb25.130613.2553@phri.nyu.edu> roy@alanine.phri.nyu.edu (Roy Smith) writes:
>>	I was surprised to observe today that if you do "find dir ..." and
>>dir is a symbolic link to a directory, the directory isn't entered.  Thus:
>>
> Now just imagine I run a script overnight to remove stale core files.
> Naturally I use something like:
>
>       find . -atime +7 -name core -exec rm \{\} \;
>
> But, unbeknowns to me, some user has done the following:
>
>	ln -s / root
>
> If find followed symbolic links, how long do you think it would take this
>script to complete its execution?

About seven hours.  At least that's what I recently discovered on my
Altos 3068.  Altos' find does traverse symbolic links, which are
implemented under SysV and, therefore, may be strange.  Altos also
has worknet, where the top level directory is @ and this is linked to
/AT by ln -s @ /AT.  Therefore, when cron says

	find / -type p -print > /FIFOs

as it does each night because tar doesn't back them up, you get into an
endless loop which eventually dies for no apparent reason.  Maybe it
uses all the space on the root partition.  That would stop it.
-- 
Bob Peirce, Pittsburgh, PA				  412-471-5320
...!uunet!pitt!investor!rbp			rbp@investor.pgh.pa.us