[comp.unix.shell] Why use find?

kimcm@diku.dk (Kim Christian Madsen) (10/05/90)

george@hls0.hls.oz (George Turczynski) writes:

>Our `find' (SunOS) supports the `-exec' option, and I assume this would be
>fairly common.  So, those of you without xargs, and who | a. can't,  b. don't
>want to | hack `find' use it like this:

>	find {{stuff here}} -exec rm -f {} \;

>Why use `xargs' when you don't need to ?

There are a lot of uses of xargs that are superior to using find with the
exec option. Find is *SLOW*, so if you have the names of files you want to
do something with in a file or pipe use xargs for performance.

Find is also complicated to use for non-trivial tasks, and the syntax  is
often confusing to non-wizards, so specialized shell-scripts or programs
are often more intuitive and faster to create and maintain.

All this is not to bash find(1), which is a wonderful, general tool, however
the multitude of UNIX tools gives you the choice of selecting the one,
most appropriate for your task and the one you're most familiar with.

						Kim Chr. Madsen

cpcahil@virtech.uucp (Conor P. Cahill) (10/06/90)

In article <1990Oct5.145825.9454@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>george@hls0.hls.oz (George Turczynski) writes:
>
>>Why use `xargs' when you don't need to ?
>
>There are a lot of uses of xargs that are superior to using find with the
>exec option. Find is *SLOW*, so if you have the names of files you want to
>do something with in a file or pipe use xargs for performance.

Find, in itself, is not slow.  The exec of a new process (the rm in this
case) for every file found is *SLOW*.  Obviously, if you already have the
list of files you don't need to run find to get them again. 

>Find is also complicated to use for non-trivial tasks, and the syntax  is
>often confusing to non-wizards, so specialized shell-scripts or programs
>are often more intuitive and faster to create and maintain.

This is getting into a typical religous issue, so I won't say much in
response other than the fact that if someone knows enough about the system
to creat shell scripts and/or programs it behooves them to take an 
extra few minutes to learn find.

Find in itself is not complicated (just some name selection logic) and 
should be one of the tools that anyone writing shells is intimately aware
of.  (BTW - I am one of the people that believe you shouldn't be allowed
to write any shell scripts until you have made a complete pass through
sections 1 & 1m (or 8, depending upon your OS) AND you should repeat this
on a regular basis).

As an aside, most of the specialized shell scripts that you mentioned will
usually run a find as part of the script anyway.

>All this is not to bash find(1), which is a wonderful, general tool, however
>the multitude of UNIX tools gives you the choice of selecting the one,
>most appropriate for your task and the one you're most familiar with.

But to make the correct selection you should be aware of most, if not
all, of the tools available (nothing is better than an educated decision).


-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

dce@smsc.sony.com (David Elliott) (10/06/90)

In article <1990Oct5.145825.9454@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>>Why use `xargs' when you don't need to ?
>
>There are a lot of uses of xargs that are superior to using find with the
>exec option. Find is *SLOW*, so if you have the names of files you want to
>do something with in a file or pipe use xargs for performance.

I think you mean that fork()/exec() is slow.

Find itself is quite reasonable, since it precompiles the predicates
and then runs down the filesystem tree.  Piping to xargs lets find do
the job it does best very quickly.

You're right about it being complicated, though.  Writing interpreted
language syntax on a command line without any newlines can be pretty
messy.

brister@decwrl.dec.com (James Brister) (10/07/90)

On 6 Oct 90 01:14:38 GMT, cpcahil@virtech.uucp (Conor P. Cahill) said:

> Find, in itself, is not slow.  The exec of a new process (the rm in this
> case) for every file found is *SLOW*.  Obviously, if you already have the
> list of files you don't need to run find to get them again. 

This is true, but compared to VMS process creation, it's lightening fast.
(All "VMS is better than UNIX remarks" can go to /dev/null or direct mail
to me).

James "A VMS programmer in a former life"
--
James Brister                                           brister@decwrl.dec.com
DEC Western Software Lab., Palo Alto, CA    {uunet,sun,pyramid}!decwrl!brister

emv@math.lsa.umich.edu (Edward Vielmetti) (10/07/90)

I squirreled away a little program called 'descend' that does the 
moral equivalent of a 'find . -print', except rather fast.  I think
it landed in alt.sources at some point.

--Ed

Edward Vielmetti, U of Michigan math dept <emv@math.lsa.umich.edu>
moderator, comp.archives

kimcm@diku.dk (Kim Christian Madsen) (10/07/90)

cpcahil@virtech.uucp (Conor P. Cahill) writes:

>In article <1990Oct5.145825.9454@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>>There are a lot of uses of xargs that are superior to using find with the
>>exec option. Find is *SLOW*, so if you have the names of files you want to
>>do something with in a file or pipe use xargs for performance.

>Find, in itself, is not slow.  The exec of a new process (the rm in this
>case) for every file found is *SLOW*.  Obviously, if you already have the
>list of files you don't need to run find to get them again. 

Maybe not on your system, but on my system (a SYSV) system, find perfoms
a getpwd(3C) each time it enters a directory, and getpwd(3) is by
standard implemented by forking a shell to do a pwd(1) in oorder to
get the result ... This makes it slow.

If this was not enough, find insists on wading through the entire
filesystem (relative to the starting point) even if matches was found, 
just to ensure that no additional matches are found - most often
you don't want this functionality (unless you're making a backup).

>>Find is also complicated to use for non-trivial tasks, and the syntax  is
>>often confusing to non-wizards, so specialized shell-scripts or programs
>>are often more intuitive and faster to create and maintain.

>This is getting into a typical religous issue, so I won't say much in
>response other than the fact that if someone knows enough about the system
>to creat shell scripts and/or programs it behooves them to take an 
>extra few minutes to learn find.

How about grouping selection criterias and the precedence of the -a and
-o (and & or) operators when grouping selection criterias ???

>Find in itself is not complicated (just some name selection logic) and 
>should be one of the tools that anyone writing shells is intimately aware
>of.  (BTW - I am one of the people that believe you shouldn't be allowed
>to write any shell scripts until you have made a complete pass through
>sections 1 & 1m (or 8, depending upon your OS) AND you should repeat this
>on a regular basis).

I'm not an apprentice in UNIX, in fact I have used the system extensively
for the last 8+ years, and uses find on a regular basis, also with fairly
complicated expressions. Still, there are times when find is not part of
the solution, and shell scripts are more intuitive.

>As an aside, most of the specialized shell scripts that you mentioned will
>usually run a find as part of the script anyway.

Sometimes, but most often good ole ls(1) will do the job nicely and faster.

>But to make the correct selection you should be aware of most, if not
>all, of the tools available (nothing is better than an educated decision).

There might be occasions when faced with the problem:

	The right way to solve problem X is to use tool Z, but in order
	to use tool Z which is either complicated to use or unfamiliar,
	the amount of time used in studying tool Z takes longer than to
	use another tool, that might not be the best, but is faster to
	employ and does the job.

The morale is optimal solutions are good -- but don't forget common
sense, or you'll end up searching indefinetely for the optimal solution
and gets nothing done, when time is critical!!!

					Kim Chr. Madsen

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (10/07/90)

In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
: The morale is optimal solutions are good -- but don't forget common
: sense, or you'll end up searching indefinetely for the optimal solution
: and gets nothing done, when time is critical!!!

Though sometimes you have to take a hit on the current problem so that you
can be better prepared for the next problem.

But yes, balance is needed.  And it's okay for different people to strike
that balance at different points.  The world needs both doers and dreamers.
(And a few people who pretend to do both.)

Larry

tchrist@convex.COM (Tom Christiansen) (10/07/90)

In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>cpcahil@virtech.uucp (Conor P. Cahill) writes:
>Maybe not on your system, but on my system (a SYSV) system, find perfoms
>a getpwd(3C) each time it enters a directory, and getpwd(3) is by
>standard implemented by forking a shell to do a pwd(1) in oorder to
>get the result ... This makes it slow.

What an idiotic way to implement that function.  It's also 
stupid of whoever sent out such a hopelessly slow version 
of find without optimizing that.  Bitch at your vendor.
Are all AT&T versions really this dumb?

>How about grouping selection criterias and the precedence of the -a and
>-o (and & or) operators when grouping selection criterias ???

Ug.  1 criterion.  2 or more criteria.  Never criterias.  

--tom
--
 "UNIX was never designed to keep people from doing stupid things, because 
  that policy would also keep them from doing clever things." [Doug Gwyn]

cpcahil@virtech.uucp (Conor P. Cahill) (10/07/90)

In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>cpcahil@virtech.uucp (Conor P. Cahill) writes:
>
>>This is getting into a typical religous issue, so I won't say much in
>>response other than the fact that if someone knows enough about the system
>>to creat shell scripts and/or programs it behooves them to take an 
>>extra few minutes to learn find.
>
>How about grouping selection criterias and the precedence of the -a and
>-o (and & or) operators when grouping selection criterias ???

For the life of me I couldn't tell you the precedence of any of the 
operators, but I still use them all the time.  If there is any question 
about the precedence (and I have questions about it anytime i use -o) I
just add "(" and ")" to get what I want.


-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

tif@doorstop.austin.ibm.com (Paul Chamberlain) (10/08/90)

In article <9848@jpl-devvax.JPL.NASA.GOV> lwall@jpl-devvax.JPL.NASA.GOV writes:
>Though sometimes you have to take a hit on the current problem so that you
>can be better prepared for the next problem.

I'll drink to that.  I try very hard to spread this philosophy.

Paul Chamberlain | I do NOT represent IBM.     tif@doorstop, sc30661 at ausvm6
512/838-7008     | ...!cs.utexas.edu!ibmaus!auschs!doorstop.austin.ibm.com!tif

dce@smsc.sony.com (David Elliott) (10/09/90)

In article <106928@convex.convex.com>, tchrist@convex.COM (Tom Christiansen) writes:
|> In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
|> >cpcahil@virtech.uucp (Conor P. Cahill) writes:
|> >Maybe not on your system, but on my system (a SYSV) system, find perfoms
|> >a getpwd(3C) each time it enters a directory, and getpwd(3) is by
|> >standard implemented by forking a shell to do a pwd(1) in oorder to

|> What an idiotic way to implement that function.  It's also 
|> stupid of whoever sent out such a hopelessly slow version 
|> of find without optimizing that.  Bitch at your vendor.
|> Are all AT&T versions really this dumb?

No, not all.

SVR4 getcwd (not getpwd, which would mean "get print working directory"?)
works just like the BSD version.

So, bitching to some vendors will get you nothing more than "wait for
SVR4".

-- 
...David Elliott
...dce@smsc.sony.com | ...!{uunet,mips}!sonyusa!dce
...(408)944-4073
..."He'll become the Sun.  We must have one you know"  "Oh"

cpcahil@virtech.uucp (Conor P. Cahill) (10/09/90)

In article <106928@convex.convex.com> tchrist@convex.COM (Tom Christiansen) writes:
>In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>>cpcahil@virtech.uucp (Conor P. Cahill) writes:
>>Maybe not on your system, but on my system (a SYSV) system, find perfoms
>>a getpwd(3C) each time it enters a directory, and getpwd(3) is by
>>standard implemented by forking a shell to do a pwd(1) in oorder to
>>get the result ... This makes it slow.

Hey, I didn't say that (carefull with those inclusions).

>What an idiotic way to implement that function.  It's also 

Getcwd() works that way on most system V systems.  Since it should be 
a rarely used function (normally only once per execution) it really 
shouldn't matter.  It has been this was at least since PWB Unix (in 
the AT&T family line).

>stupid of whoever sent out such a hopelessly slow version 
>of find without optimizing that.  Bitch at your vendor.

Very true.  Most finds will only run a pwd at start up time (so it knows
where it was started from so it can start any -execs there) not each
time it enters a new directory.

-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

davidsen@sixhub.UUCP (Wm E. Davidsen Jr) (10/09/90)

In article <1990Oct7.001518.14216@diku.dk> kimcm@diku.dk (Kim Christian Madsen) writes:
>cpcahil@virtech.uucp (Conor P. Cahill) writes:
>Maybe not on your system, but on my system (a SYSV) system, find perfoms
>a getpwd(3C) each time it enters a directory, and getpwd(3) is by
>standard implemented by forking a shell to do a pwd(1) in oorder to
>get the result ... This makes it slow.

  What?? There may be such an implementation somewhere, but I can't
imagine why anyone would do that. Find doesn't need to use an absolute
pathname, it has the starting name on the command line, and if it needed
this info it would only need to do it at most once and keep track of it
from there on.

  Moreover running find on a large directory shows no such info in the
accounting file.

  I guess I'm saying that I doubt that this is (a) needed or (b)
generally true.
-- 
bill davidsen - davidsen@sixhub.uucp (uunet!crdgw1!sixhub!davidsen)
    sysop *IX BBS and Public Access UNIX
    moderator of comp.binaries.ibm.pc and 80386 mailing list
"Stupidity, like virtue, is its own reward" -me

doughert@lafcol.UUCP (Andy Dougherty) (10/10/90)

> find with -exec vs. find -print | xargs
One problem with xargs is that even if it is present on a particular
system, it might not work.  I used one system (SysVR2) where xargs 
occassionally dropped the last argument.  I never had the time to get 
it all sorted out, but I never used xargs for anything critical.
    			--Andy Dougherty

cpcahil@virtech.uucp (Conor P. Cahill) (10/12/90)

In article <2518@lafcol.UUCP> doughera@lafayett.bitnet (Andy Dougherty) writes:
>> find with -exec vs. find -print | xargs
>One problem with xargs is that even if it is present on a particular
>system, it might not work.  I used one system (SysVR2) where xargs 
>occassionally dropped the last argument.  I never had the time to get 
>it all sorted out, but I never used xargs for anything critical.

Does this mean that if you find a system with a bug in the compiler you
will never use a C compiler again?  

When you find a bug, report it to the vendor (yes, you can yell at them)
work around it if possible, or replace it with some suitable substitute.  
However, don't think that since there is one bug in one implementation
of a program that it will always be there in any implementation.


-- 
Conor P. Cahill            (703)430-9247        Virtual Technologies, Inc.,
uunet!virtech!cpcahil                           46030 Manekin Plaza, Suite 160
                                                Sterling, VA 22170 

geoffg@sigma21.oz.au (Geoffrey R Graham) (10/17/90)

doughert@lafcol.UUCP (Andy Dougherty) writes:
   >One problem with xargs is that even if it is present on a particular
   >system, it might not work.  I used one system (SysVR2) where xargs 
   >occassionally dropped the last argument.  I never had the time to get 
   >it all sorted out, but I never used xargs for anything critical.

If my memory is correct that would have been on an UniPlus+ System V.2
system.  If it was then the bug was specific to UniPlus+ and was
introduced by someone trying to fix another bug.  Both bugs should have
been fixed by now.
-- 
Geoff Graham
Sigma Data Corporation                                   geoffg@sigma21.oz.au
Western Australia                 Phone +61 9 321 1116     FAX +61 9 321 9178