[net.unix] Need unix command file HELP!

sutton@aero.ARPA (Stew Sutton) (01/30/86)

          


We are looking for a utility that can, when given a arbitrary string,
can locate all occurences of that string anywhere on the system. Our
local Un*x gurus can't figure this out, so we are appealing to those out
in Netland to help us out.

We are looking for the command to work like this:

findstring this-is-the-string

The utility would return all the files (and their pathnames from the
root) to the screen. Of course if the protections on the file indicate
that the file cannot be read, the program should ignore that file and keep
on going. We think it can be done using a command file using the 'ls'
and 'awk' commands but we just can get it right.

Please send source code (or ideas on writing this code) to us and we
will post to net a summary of working code.

Thanks in advance.

sutton@aerospace.ARPA
{ihnp4!sdcrdcf,randvax,trwrb} ! aero ! sutton
sutton%aerospace.ARPA@WISCVM.BITNET

earle@smeagol.UUCP (Greg Earle) (02/01/86)

> We are looking for a utility that can, when given a arbitrary string,
> can locate all occurences of that string anywhere on the system. Our
> local Un*x gurus can't figure this out, so we are appealing to those out
> in Netland to help us out.

Some Gurus you got there ...

> We are looking for the command to work like this:
> 
> findstring this-is-the-string

find / -exec fgrep this-is-the-string '{}' \; (UGGGHHH!)

Warning! Only execute during hours when no one else is in building!!
Guaranteed to tie up CPU for indefinite periods! :@)

If you only want the file names, this *might* work, I'm not sure ...

find / -exec "fgrep this-is-the-string '{}' | awk -F: '{print $1}'" \;
(DOUBLE UGGGHHH)
-- 

	Greg Earle
	JPL Spacecraft Data Systems group
	sdcrdcf!smeagol!earle (UUCP)
	ia-sun2!smeagol!earle@csvax.caltech.edu (ARPA)

chris@umcp-cs.UUCP (Chris Torek) (02/01/86)

In article <245@aero.ARPA> sutton@aero.UUCP (Stew Sutton) writes:

> We are looking for a utility that can, when given a arbitrary string,
> locate all occurences of that string anywhere on the system.

First:  Since you want to check all files on the machine, you should
immediately think `find'---likewise for `all files within particular
directory trees', which is of course the general case.

Second:  Since you want to match strings within the files, you
should immediately think `grep' (or variants).

> The utility [should] return all the files (and their pathnames
> from the root) to the screen.

If you want matching lines printed, that is easy; if you want the
entire contents of the file printed, that is also easy.  But I will
assume that the names alone are sufficient.

> Of course if the protections on the file indicate that the file
> cannot be read, the program should ignore that file and keep
> on going.

Also easy: just discard error messages from grep.

So, thinking `find' and `grep' (but I will use egrep since it is
usually faster), you consult the man entries, and . . . :

	$ find / -exec egrep "pattern" {} \; -print 2> /dev/null

This is a bit more difficult in the C shell, as it is unable to
redirect stderr without also redirecting stdout.  However, if you
already know where stdout is going, it can be done:

	% (find / -exec egrep "pattern" {} \; -print > /dev/tty) >& /dev/null

(My own solution to the C shell's redirection quirks is to run `sh'
before doing anything complex.)
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

mo@wgivax.UUCP (02/02/86)

> From: chris@umcp-cs.UUCP (Chris Torek)
> $ find / -exec egrep "pattern" {} \; -print 2> /dev/null

An excellent answer to the find strings in any file request.
I have a few suggestions to add:

      * if you only want the file names, use '-l' flag with
		one of the grep variants and you won't have to worry
		about redirecting stderr and out seperately
		(you won't need '-print' flag to find)

	  * if you are looking for specific strings, as opposed to
	    regular expressions, use fgrep instead of egrep

      * to print context, look into "cgrep", posted to net.source
        a short while back -- it allows you to print m lines before
		and n lines after each match

      * you only want to look at files, so add '-type f' to find

You end up with:

$ find / -type f -exec fgrep -l "string" {} \; > file

Some notes:

      * '/' can be replaced by any directory to search its sub-tree

	  * if you are looking for the same strings at periodic times,
		you might want to consider keeping a "current" file listing
		the occurrences, adding '-newer file' to the find command,
		and appending to the file, thus negating the need to pick
		up all occurrences every time.

Mike O'Shea    decvax!mcnc!unccvax!wgivax!mo

hfavr@mtuxo.UUCP (a.reed) (02/04/86)

> We are looking for a utility that can, when given a arbitrary string,
> can locate all occurences of that string anywhere on the system. Our
> local Un*x gurus can't figure this out, so we are appealing to those out
> in Netland to help us out.
> We are looking for the command to work like this:
> 
> findstring this-is-the-string
> 
> The utility would return all the files (and their pathnames from the
> root) to the screen. Of course if the protections on the file indicate
> that the file cannot be read, the program should ignore that file and keep
> on going. We think it can be done using a command file using the 'ls'
> and 'awk' commands but we just can get it right.
> Please send source code (or ideas on writing this code) to us and we
> will post to net a summary of working code.
> Thanks in advance.
> 
> sutton@aerospace.ARPA
> {ihnp4!sdcrdcf,randvax,trwrb} ! aero ! sutton
> sutton%aerospace.ARPA@WISCVM.BITNET

# In ksh or sh this is a one-liner:
2>/dev/null find / -exec fgrep -l $1 {} \;
# Please do not post ELEMENTARY shell questions to net.unix-wizards!
# 			Adam Reed (ihnp4!npois!adam)

aglew@ccvaxa.UUCP (02/04/86)

                                                                  




I came up with this csh alias for a recursive grep at the
beginning of this year when I had to do a lot of code reading:

alias rgl find !:2* -name \* -exec grep -l !:1 /dev/null \{\} \; 

It works, but is blessed slow. Our version of find, at least,
will mess up if you have cycles in your directory tree;
it will also find link-aliased files many times.
The /dev/null is to force our version of grep to print out
the filename when there is only one file (wildcards can be
worked in to make it more efficient, but it gets messy
enough to need a shell script (if not, tell me!)).
							    

wescott@sauron.UUCP (Michael Wescott) (02/04/86)

In article <587@smeagol.UUCP> earle@smeagol.UUCP (Greg Earle) writes:
>> We are looking for a utility that can, when given a arbitrary string,
>> can locate all occurences of that string anywhere on the system.
>
>find / -exec fgrep this-is-the-string '{}' \; (UGGGHHH!)
>
>Warning! Only execute during hours when no one else is in building!!
>Guaranteed to tie up CPU for indefinite periods! :@)
>
>If you only want the file names, this *might* work, I'm not sure ...
>
>find / -exec "fgrep this-is-the-string '{}' | awk -F: '{print $1}'" \;
>(DOUBLE UGGGHHH)

I agree, UGGHHH.  I know its not on most BSD systems, but it has its
uses.  I'm talking about `xargs'.  For much less cpu usage try:

find / -type f -print | grep -v outfile | xargs grep 'pattern'  > outfile

or some reasonable variation with your favorite grep (or bm).
Xargs accumulates arguments from stdin and execs the command and args
given for a reasonable number of collected arguments.  Hence grep
gets executed once per ten or twenty files rather than once per file.

I think it's reasonable to expect to search only regular files, hence
"-type f" and you better exclude "outfile" unless you use "grep -l".

	-Mike Wescott
	ncrcae!wescott

jso@edison.UUCP (John Owens) (02/04/86)

> We are looking for the command to work like this:
> 
> findstring this-is-the-string
> 
> The utility would return all the files (and their pathnames from the
> root) to the screen. Of course if the protections on the file indicate
> that the file cannot be read, the program should ignore that file and keep
> on going. We think it can be done using a command file using the 'ls'
> and 'awk' commands but we just can get it right.
> 
> sutton@aerospace.ARPA
> {ihnp4!sdcrdcf,randvax,trwrb} ! aero ! sutton

find / -type f -exec fgrep -l $1 {} \; 2>/dev/null

-- 

			   John Owens
General Electric Company		Phone:	(804) 978-5726
Factory Automated Products Division	Compuserve: 76317,2354
	       houxm!burl!icase!uvacs
...!{	       decvax!mcnc!ncsu!uvacs	}!edison!jso
		 gatech!allegra!uvacs

smithson@calma.UUCP (Brian Smithson) (02/04/86)

In article <587@smeagol.UUCP> earle@smeagol.UUCP (Greg Earle) writes:
>> We are looking for a utility that can, when given a arbitrary string,
>> can locate all occurences of that string anywhere on the system. Our
>> local Un*x gurus can't figure this out, so we are appealing to those out
>> in Netland to help us out.
>
>Some Gurus you got there ...
>
>> We are looking for the command to work like this:
>> 
>> findstring this-is-the-string
>
>find / -exec fgrep this-is-the-string '{}' \; (UGGGHHH!)
>
>Warning! Only execute during hours when no one else is in building!!
>Guaranteed to tie up CPU for indefinite periods! :@)
>[...]
> 
How about:  nice -20 "find / -exec fgrep this-is-the-string {} \;"  ?
Better pack a lunch, though... :-)

levy@ttrdc.UUCP (Daniel R. Levy) (02/09/86)

<Oh oh here it comes.  Watch out boy, it'll chew you up! \
Oh oh here it comes.  The LINE EATER!  [Line eater]>

In article <2981@umcp-cs.UUCP>, chris@umcp-cs.UUCP (Chris Torek) writes:
>In article <245@aero.ARPA> sutton@aero.UUCP (Stew Sutton) writes:
>Also easy: just discard error messages from grep.

Not just that; you ALSO want to discard the crap that grep will spit at the
tty if the string wanted is found in a BINARY FILE!  (Some terminals get
VERY UPSET at random binary sequences.)  Better yet, use the '-l' option:

	$ find / -exec {e,f,""}grep -l "pattern" {} \;

(fgrep is best for FIXED patterns that contain characters which egrep and
grep would treat as wild cards.)

>--
>In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1415)
>UUCP:	seismo!umcp-cs!chris
>CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu
-- 
 -------------------------------    Disclaimer:  The views contained herein are
|       dan levy | yvel nad      |  my own and are not at all those of my em-
|         an engihacker @        |  ployer or the administrator of any computer
| at&t computer systems division |  upon which I may hack.
|        skokie, illinois        |
 --------------------------------   Path: ..!{akgua,homxb,ihnp4,ltuxa,mvuxa,
						vax135}!ttrdc!levy

jsdy@hadron.UUCP (Joseph S. D. Yao) (02/11/86)

All the folk who are responding that the way to get the file names
of files containing a particular string are kind of forgetting that
the grep family does  n o t  automatically print out file names.
This:

>find / -exec fgrep this-is-the-string '{}' \;

will give a file full of lines containing this-is-the-string.  Try:

find / -exec grep this-is-the-string '{}' /dev/null \;

**OR** (quicker) :

find / -type d -a -exec ksh findstr "this-is-the-string" {} \;

findstr:
#!/bin/ksh
# or /bin/sh

str="$1"
dir="$2"
file=""
text=""

if [ ! -d "$dir" ]; then exit 1; fi

cd "$dir"

for file in *; do
	if [ ! -f "$file" ]; then continue; fi
	text=`file "$file" | grep text`
	if [ "" = "$text" ]; then continue; fi
	# if you want the complete text:
	# grep "$str" "$dir/$file" /dev/null
	# otherwise
	text=`grep "str" "$file" | line`
	if [ "" != "$text" ]; then
		echo "$dir/$file"
	fi
done
exit 0
-- 

	Joe Yao		hadron!jsdy@seismo.{CSS.GOV,ARPA,UUCP}

mo@wgivax.UUCP (02/15/86)

>From jsdy@hadron.UUCP (Joseph S. D. Yao) Sun Feb  6 01:28:16 206
>Summary: grep won't always print file name!
>
>All the folk who are responding that the way to get the file names
>of files containing a particular string are kind of forgetting that
>the grep family does  n o t  automatically print out file names.
>This:
>
>>find / -exec fgrep this-is-the-string '{}' \;
>
>will give a file full of lines containing this-is-the-string.  Try:
>
>find / -exec grep this-is-the-string '{}' /dev/null \;

WRONG!  fgrep -l WILL print the file name, and WILL NOT print the string
        it will look for ONLY the first occurrence in a file, speeding
		things up, AND fgrep is faster than grep

>**OR** (quicker) :
>
>find / -type d -a -exec ksh findstr "this-is-the-string" {} \;

(-: GREAT,  NOW HOW DO I FIND THE KORN SHELL ? :-)

>findstr:
>#!/bin/ksh
># or /bin/sh
>
>str="$1"
>dir="$2"
>file=""
>text=""
>
>if [ ! -d "$dir" ]; then exit 1; fi
>
>cd "$dir"
>
>for file in *; do
>	if [ ! -f "$file" ]; then continue; fi
>	text=`file "$file" | grep text`
>	if [ "" = "$text" ]; then continue; fi
>	# if you want the complete text:
>	# grep "$str" "$dir/$file" /dev/null
>	# otherwise
>	text=`grep "str" "$file" | line`
>	if [ "" != "$text" ]; then
>		echo "$dir/$file"
>	fi
>done
>exit 0

this is admittedly "safer", since it skips non-text files, but look at all
those sub-processes you're starting up for every used inode on the system!

haven't we heard enough about this, YET?

gkloker@utai.UUCP (Geoff Loker) (02/16/86)

In article <259@hadron.UUCP> jsdy@hadron.UUCP (Joseph S. D. Yao) writes:
>All the folk who are responding that the way to get the file names
>of files containing a particular string are kind of forgetting that
>the grep family does  n o t  automatically print out file names.
>This:
>
>>find / -exec fgrep this-is-the-string '{}' \;
>
>will give a file full of lines containing this-is-the-string.  Try:
>

I don't know if this is any quicker than your script file suggestion
for finding file names with the string (we don't have ksh), but the
grep family does have an option to print out only the name(s) of
file(s) that contain the match-string.  Try:

find / -exec fgrep -l this-is-the-string '{}' \;
-- 
Geoff Loker
Department of Computer Science
University of Toronto
Toronto, ON
M5S 1A4

USENET:	{ihnp4 decwrl utzoo uw-beaver}!utcsri!utai!gkloker
CSNET:		gkloker@toronto
ARPANET:	gkloker.toronto@csnet-relay

geoff@desint.UUCP (Geoff Kuenning) (02/17/86)

In article <259@hadron.UUCP> jsdy@hadron.UUCP (Joseph S. D. Yao) writes:

> 	text=`file "$file" | grep text`

Actually, there's still a gotcha here:  it will pick up files named
'text.o', 'context.o', etc. (I know this from painful experience!).  A
better way to detect text files is:

	text=`file "$file" | grep ':.* text'`

or even
	text=`file "$file" | grep 'text$'`

but I'd use the last only after checking the source of /bin/file to make
sure it always put 'text' last on the line.

Anyway, here is a TESTED (what a radical idea) shell script that:
(1) Searches only text files (2) Accepts the -l and -n switches of 'grep'
(3) remembers to put in /dev/null if -l is not specified, and (4) uses
xargs if it is available in /bin or /usr/bin.  Both the BSD and the
System V variants have been tested (it adapts dynamically).  It is an
improved version of a script that I sent to the original question-asker
more than a month ago.

Now, can we *please* move on to a new subject?

	Geoff Kuenning
	{hplabs,ihnp4}!trwrb!desint!geoff
------------------------cut here--------------------------
: Use /bin/sh
#!/bin/sh
#
#	Locate a string in any (text) file, anywhere in the system.
#
#	Usage:
#
#	findstring [-l] [-n] [-g grep-program] [root-directory] search-string
#
#	If the search-string contains semicolons, they should be backslashed,
#	thus:
#
#	findstring "break\;"
#
#	The -l and -n switches are passed to the 'grep' program.
#
#	The -g switch selects a different grep program;  the default is 'grep'.
#
#	WARNING:  this command is very slow, and loads down the system
#	quite heavily.  The System V version opens every file in the
#	system;  the BSD version does that and also spawns at least one
#	process for every file in the system and another for every
#	text file.  You can reduce the system load by a little bit by
#	initiating the command from the root directory.
#
#	Note:  this is written to be portable to System V and BSD.  It
#	has only been tested on system V, though the BSD code was also
#	tested there.
#
PATH=/bin:/usr/bin
ROOTDIR=/
grepargs=
nullfile=/dev/null
grep=grep
while :
do
    case "X$1" in
	X-l)
	    grepargs="$grepargs -l"
	    nullfile=
	    shift
	    ;;
	X-n)
	    grepargs="$grepargs -n"
	    shift
	    ;;
	X-g)
	    grep=$2
	    shift; shift
	    ;;
	X-*)
	    set illegal arguments - this will cause a message
	    break
	    ;;
	*)
	    break
	    ;;
    esac
done
if [ $# -gt 1 ]
then
    ROOTDIR=$1
    shift
fi
if [ $# -ne 1 ]
then
    echo 'Usage:  findstring [-l] [-n] [-g grep-program]' \
      '[root-directory] search-string' 1>&2
    exit 2
fi
#
#	If you have UniSoft System V, test xargs to see if it has a bug by
#	typing:
#
#	    echo a b c | xargs echo
#
#	If you get nothing back, you have the bug.  If you get "a b c",
#	you don't.
#
#	If you have the bug, you will have to disable the xargs variant
#	below and make it run the non-xargs version.  This is unfortunately
#	*much* slower.
#
if [ -x /bin/xargs -o -x /usr/bin/xargs ]
then
    #
    #	The system has xargs; use it
    #
    find $ROOTDIR -type f -print \
      | xargs file \
      | sed -n '/:	.* text/s/:	.*$//p' \
      | xargs $grep $grepargs "$1" $nullfile
else
    #
    #	Too bad, there's no xargs.  We'll have to do it the hard way.
    #
    find $ROOTDIR -type f -exec file {} \; \
      | sed -n '/:	.* text/{
	  s/:	.*$//
	  s;^;'"$grep $grepargs '$1' $nullfile"' ;p
	  }' \
      | sh
fi
#
# Unfortunately, the grep's will return a nonzero status if they find
# nothing, so there isn't much point in returning their status.
#
exit 0

lew@gsg.UUCP (Paul Lew) (02/17/86)

>All the folk who are responding that the way to get the file names
>of files containing a particular string are kind of forgetting that
>the grep family does  n o t  automatically print out file names.
>This:
>
>find / -exec fgrep this-is-the-string '{}' \;
>
>will give a file full of lines containing this-is-the-string.  Try:
>

	Notice that if you do grep on more than one files, file
	names will be displayed.  A simple solution to the problem
	is to use:

	 find / -exec fgrep this-is-the-string /dev/null '{}' \;

	and YOU DO NOT HAVE TO WRITE ANY SCRIPT to do so.
-- 
----------------------------------------------------------------------
Paul S. Lew				decvax!gsg!lew		(UUCP)

General Systems Group
51 Main Street, Salem, NH  03079	(603) 893-1000
----------------------------------------------------------------------

mjs@sfsup.UUCP (M.J.Shannon) (02/18/86)

> >This:
> >
> >>find / -exec fgrep this-is-the-string '{}' \;
> >
> >will give a file full of lines containing this-is-the-string.  Try:
> >
> 
> find / -exec fgrep -l this-is-the-string '{}' \;
> -- 
> Geoff Loker

If your grep/egrep/fgrep doesn't support -l, then try the following:

	find / -exec fgrep string '{}' /dev/null ';' |
		sed -e 's/:.*//' |
		sort -t/ -u

Note that this fails miserably if you have files whose names include a ':'.
-- 
	Marty Shannon
UUCP:	ihnp4!attunix!mjs
Phone:	+1 (201) 522 6063

Disclaimer: I speak for no one.

"If I never loved, I never would have cried." -- Simon & Garfunkel

stevesu@copper.UUCP (Steve Summit) (02/18/86)

In article <245@aero.ARPA>, sutton@aero.ARPA (Stew Sutton) writes:
> We are looking for a utility that can, when given a arbitrary string,
> can locate all occurences of that string anywhere on the system. Our
> local Un*x gurus can't figure this out, so we are appealing to those out
> in Netland to help us out.

Stew's question has basically been answered, but I've got two
cents to add:

	1. Since such a command is probably going to generate
	   voluminous output, it is tempting to redirect it to a
	   file for later perusal.  If you do so, be extremely
	   careful: if your program is searching the entire
	   filesystem, it is likely to find your output file,
	   each of whose lines contains the string you're looking
	   for, and which will therefore get re-appended to the
	   file, ad infinitum...

	   I make this mistake every few years, filling up a disk
	   every time.  If you don't need to search the entire
	   filesystem, just make sure you put the output file
	   somewhere where it won't get found, like /tmp.  The
	   general solution would be an exclusion option on the
	   find command, which would be generally useful.
	   (Another trick would be to make the output file
	   unreadable.)

	2. Joe Yao pointed out the problem of the grep family not
	   printing the filename if given a single argument.  My
	   solution, which is a bit wasteful, but probably more
	   efficient than Joe's shell script, goes like this:

		find / -exec grep 'little dog' {} /dev/null \;

	   grep notices two arguments, so cheerfully prints the
	   filename if it finds the string, although it's
	   virtually guaranteed never to occur in the second one
	   (unless /dev/null accidentally got replaced with a
	   real file, but that's another story).

                                         Steve Summit
                                         tektronix!copper!stevesu

stevesu@copper.UUCP (Steve Summit) (02/18/86)

A thousand apologies.  Joe Yao suggested the exact same /dev/null
trick I did; I missed it and got distracted by the complicated-
looking Korn shell script at the bottom of his article.

                                         Steve Summit
                                         tektronix!copper!stevesu

dv@well.UUCP (David W. Vezie) (02/24/86)

In article <144@wgivax.UUCP> mo@wgivax.UUCP writes:
>WRONG!  fgrep -l WILL print the file name, and WILL NOT print the string
>        it will look for ONLY the first occurrence in a file, speeding
>		things up, AND fgrep is faster than grep
>

Ummm...  I don't know about your machine, but I just did an informal
benchmark comparing {e,,f}grep for speed, and found out that of the
three, egrep is fastest, followed by grep, and slowest was fgrep.
(this is on 4.2BSD)
--- 
David W. Vezie
	    {dual|hplabs}!well!dv - Whole Earth 'Lectronics Link, Sausalito, CA
(4 lines, 113 chars)

mo@wgivax.UUCP (02/27/86)

>Reply-To: dv@well.UUCP (David W. Vezie)

>In article <144@wgivax.UUCP> mo@wgivax.UUCP writes:
>>WRONG!  fgrep -l WILL print the file name, and WILL NOT print the string
>>        it will look for ONLY the first occurrence in a file, speeding
>>		things up, AND fgrep is faster than grep
>>

>Ummm...  I don't know about your machine, but I just did an informal
>benchmark comparing {e,,f}grep for speed, and found out that of the
>three, egrep is fastest, followed by grep, and slowest was fgrep.
>(this is on 4.2BSD)

I haven't used egrep very much, but having worked with UNIX for 5 years
on Vax 11/780's, Sun's, Masscomp's, and various other 68k machines (mostly
4.[12]), I have always observed (-: yes, this is a subjective observation :-)
that fgrep runs faster than grep when searching for a specific string.

Anyway, the point is that the grep family of commands has an option which
will print out the file name, and not the lines in which the pattern occurs.

Let's avoid a holy war.  The point in this entire back and forth discussion
is RTFM!!!!  There have been many mistakes in the postings responding to
the original article.  It's great to want to help, but be sure that you have
all the facts before giving advice.