[comp.unix.shell] does a zgrep exist?

brister@decwrl.dec.com (James Brister) (02/15/91)

I guess this is marginally under comp.unix.shell...

It would be nice to grep through compressed files. Sure, you can do zcat |
grep regexp, but then you loose the ability of grep to tell you the
filename and/or linenumber of a match.  Uncompressing the files before
greping isn't really wanted, cause you may not have the space on disk.

Does such a thing exist?

Thanks

James
--
James Brister                                           brister@decwrl.dec.com
DEC Western Software Lab., Palo Alto, CA    {uunet,sun,pyramid}!decwrl!brister
"Cogito cogito, ergo cogito sum"

tchrist@convex.COM (Tom Christiansen) (02/16/91)

From the keyboard of brister@decwrl.dec.com (James Brister):
:It would be nice to grep through compressed files. Sure, you can do zcat |
:grep regexp, but then you loose the ability of grep to tell you the
:filename and/or linenumber of a match.  Uncompressing the files before
:greping isn't really wanted, cause you may not have the space on disk.
:
:Does such a thing exist?

You might look to lwall's pipegrep program.  It's on page 265 of
the perl book, and available via anon ftp from uunet inside the
compressed tarchive nutshell/perl/perl.tar.Z in ch6/pipegrep.

You say:
    $ pipegrep 'some_pattern' cmd files

so you could say:

    $ pipegrep 'foo.*bar$' zcat *.Z

It would only zcat one file at a time, and grep through the pipe, 
and then prepend the (command and) filename it found it on.  Quoting
from the book:

    The pipegrep program greps the output of a series of commands.  The
    difficulty with doing this using the normal grep program is that you
    lose track of which file was being processed.  This program prints
    out the command it was executing at the time, including the filename.
    The command, which is a single argument, will be executed once for
    each file in the list.


It's pretty quick little program, faster than stock greps, not
quite so fast as GNU grep.


--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
 "All things are possible, but not all expedient."  (in life, UNIX, and perl)

Dan_Jacobson@ATT.COM (02/16/91)

Following this trend you'll have a z<everything> ... sort of like
r<everything> (rsh, rcp ...).  Sort of a bad road to follow too far I
guess.
-- 
Dan_Jacobson@ATT.COM  Naperville IL USA  +1 708-979-6364

brister@decwrl.dec.com (James Brister) (02/16/91)

On 15 Feb 91 22:21:13 GMT, Dan_Jacobson@ATT.COM said:

> Following this trend you'll have a z<everything> ... sort of like
> r<everything> (rsh, rcp ...).  Sort of a bad road to follow too far I
> guess.

Given that one of the laws of computer operations states that data will
expand to fill all available space on a disk, I'd say that this road is a
necessary path to take.

James
--
James Brister                                           brister@decwrl.dec.com
DEC Western Software Lab., Palo Alto, CA    {uunet,sun,pyramid}!decwrl!brister
"Cogito cogito, ergo cogito sum"

ronald@robobar.co.uk (Ronald S H Khoo) (02/16/91)

brister@decwrl.dec.com (James Brister) writes:

> It would be nice to grep through compressed files. Sure, you can do zcat |
> grep regexp, but then you loose the ability of grep to tell you the
> filename and/or linenumber of a match.

Well, if you have it, zcat | grep regexp /dev/stdin would do what you wanted.
If you can't hack a /dev/stdin driver into your kernel, then as an alternative
you could modify grep to understand "-" as a filename.
-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)

mike (02/18/91)

In an article, robobar.co.uk!ronald (Ronald S H Khoo) writes:
|Well, if you have it, zcat | grep regexp /dev/stdin would do what you wanted.
|If you can't hack a /dev/stdin driver into your kernel, then as an alternative
|you could modify grep to understand "-" as a filename.

Why, when grep assumes stdin if you only supply the regex?

	zcat file.Z | grep regex

-- 
Michael Stefanik, MGI Inc., Los Angeles| Opinions stated are not even my own.
Title of the week: Systems Engineer    | UUCP: ...!uunet!bria!mike
-------------------------------------------------------------------------------
Remember folks: If you can't flame MS-DOS, then what _can_ you flame?

tchrist@convex.COM (Tom Christiansen) (02/18/91)

From the keyboard of uunet!bria!mike:
:Why, when grep assumes stdin if you only supply the regex?
:
:	zcat file.Z | grep regex

Guys, the original problem was that 

	zcat *.Z | grep regexp

didn't help him identify in which file the match was found.  Witness:

From brister@decwrl.december.com (James Brister) in
<BRISTER.91Feb14204903@saratoga.decwrl.december.com>:

>It would be nice to grep through compressed files. Sure, you can do zcat |
>grep regexp, but then you loose the ability of grep to tell you the
>filename and/or linenumber of a match.  Uncompressing the files before
>greping isn't really wanted, cause you may not have the space on disk.

How quickly we forget!

--tom
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
 "All things are possible, but not all expedient."  (in life, UNIX, and perl)

sleepy@wybbs.mi.org (Mike Faber) (02/19/91)

In article <BRISTER.91Feb14204903@saratoga.decwrl.dec.com> brister@decwrl.dec.com (James Brister) writes:
>I guess this is marginally under comp.unix.shell...
Well, here's a comp.unix.shell solution...


>It would be nice to grep through compressed files. Sure, you can do zcat |
>grep regexp, but then you loose the ability of grep to tell you the
>filename and/or linenumber of a match.  Uncompressing the files before
>greping isn't really wanted, cause you may not have the space on disk.
>James

for i in (list of files you want to zgrep or $*)
do
zcat $i | grep (whatever) | awk fil=$i '{printf("%s:%s\n",fil,$0)}'
done

Ok, it's ugly, but it should work...

--
sleepy@wybbs.UUCP                     He who uses curses often...
Michael Faber                         curses often.

kjetilho@ifi.uio.no (Kjetil Torgrim Homme) (02/19/91)

A suggestion for a zgrep-script:
It has the slight problem that options has to be specified in quotes, like

	zgrep ^foo "-n -v" *.Z
	zgrep 'bar[f-o]' "" *.c.*

Note: no options has to be denoted by empty quotes.
An oddity: The name of the file comes after grep's output.


Just my 2 xre. (Does the eighth bit survive the Atlantic Ocean?)
Kjetil T.

#! /bin/sh
regexp=$1; options=$2; shift 2
for file in $@; do
        zcat $file | grep -e $regexp $options
        if test $? = 0; then
                echo "-- $file --"
        fi
done

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/19/91)

In article <1991Feb18.075330.15536@convex.com> tchrist@convex.COM (Tom Christiansen) writes:
> Guys, the original problem was that 
> 	zcat *.Z | grep regexp
> didn't help him identify in which file the match was found.

Yep. The right solution is to have a single-stream format that can
encode multiple files and various information about the files, including
filenames. Then give programs a -f flag to accept input (and produce
output) in that format.

tar and cpio formats are too verbose in some ways and too restricted in
others. In a coming message I will propose an alternative.

---Dan

sleepy@wybbs.mi.org (Mike Faber) (02/19/91)

a simple shell invoked with arguements is easy enough...

$ sh zgrep string file [file ...]

zgrep:
if [ $# -lt 2]
then
	echo usage:
	exit 1
fi

vgrep=$1
shift

while [ $# -gt 0 ]
do
	zcat $i | grep $vgrep | awk fil=$1 '{printf("%s:%s\n",fil,$0)}'
	shift
done


Now, was that so painful...

--
sleepy@wybbs.UUCP
Michael Faber

clewis@ferret.ocunix.on.ca (Chris Lewis) (02/20/91)

In article <1991Feb15.232854.13378@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:
>brister@decwrl.dec.com (James Brister) writes:
>
>> It would be nice to grep through compressed files. Sure, you can do zcat |
>> grep regexp, but then you loose the ability of grep to tell you the
>> filename and/or linenumber of a match.

>Well, if you have it, zcat | grep regexp /dev/stdin would do what you wanted.
>If you can't hack a /dev/stdin driver into your kernel, then as an alternative
>you could modify grep to understand "-" as a filename.

Another approach is some variation (as in, this is untested, but
the algorithm's okay) on:
	pat=$1
	shift
	for i
	do
	    case $i in
		*.Z)
		    zcat $i | grep -n "$pat"
		    ;;
		*)
		    grep -n "$pat" $i
		    ;;
	    esac | sed -e 's/^/'$i': /'
	done
		    
-- 
Chris Lewis, Phone: (613) 832-0541, Internet: clewis@ferret.ocunix.on.ca
UUCP: uunet!mitel!cunews!latour!ecicrl!clewis; Ferret Mailing List:
(ferret-request@eci386); Psroff (not Adobe Transcript) enquiries:
psroff-request@eci386, current patchlevel is *7*.

msb@sq.sq.com (Mark Brader) (02/20/91)

Here's what I use.  I place it in the public domain; use at your own risk.
You should insert a PATH= line to suit your system.  This program runs any
of grep, egrep, or fgrep, if it is linked as zgrep, zegrep, and zfgrep
respectively.  Unlike zcat, it will *not* accept "foo" as a synonym for
"foo.Z"; nor will it allow compressed and non-compressed files to be mixed.

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  zgrep
# Wrapped by msb@sq.com on Tue Feb 19 22:10:54 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'zgrep' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'zgrep'\"
else
echo shar: Extracting \"'zgrep'\" \(541 characters\)
sed "s/^X//" <<'END_OF_FILE' | tr '&' '\030' >zgrep
X#!/bin/sh
X: zg - "zcat | grep (or egrep or fgrep)"
X
grep=`basename $0 | sed 's/z//'`
X
dashl=""
opts=""
for arg
do
X	case "x$arg" in
X	x-l)		dashl=y; shift;;
X	x-[a-zA-Z])	opts="$opts $arg"; shift;;
X	*)		pat="$arg"; shift; break;;
X	esac
done
X
case "x$dashl" in
xy)	for file
X	do
X		if zcat <"$file" | $grep $opts "$pat" >/dev/null
X		then
X			echo "$file"
X		fi
X	done
X	exit
esac
X
case $# in
X0)	zcat | $grep $opts "$pat"; exit;;
X1)	zcat <"$1" | $grep $opts "$pat"; exit;;
esac
X
for file
do
X	zcat <"$file" | $grep $opts "$pat" | sed "s&^&$file:&"
done
END_OF_FILE
if test 541 -ne `wc -c <'zgrep'`; then
    echo shar: \"'zgrep'\" unpacked with wrong size!
fi
chmod +x 'zgrep'
# end of 'zgrep'
fi
echo shar: End of shell archive.
exit 0
-- 
Mark Brader		    "Howeb45 9 qad no5 und8ly diturvrd v7 7jis dince
SoftQuad Inc., Toronto	     9 qas 8mtillihemt mot ikkfavpur4d 5esoyrdeful
utzoo!sq!msb, msb@sq.com     abd fill if condif3nce on myd3lf."      -- Cica

This article is in the public domain.

melby@daffy.yk.Fujitsu.CO.JP (John B. Melby) (02/21/91)

>> Following this trend you'll have a z<everything> ... sort of like
>> r<everything> (rsh, rcp ...).
>
>Why rule out some nifty programs such as zsh, ...

You forgot zftp (just don't forget to set binary mode, or nothing will
work... :-) )

(Disclaimer:  If zftp happens to exist, ignore the above comment.)

-----
John B. Melby
Fujitsu Limited, Machida, Japan
melby%yk.fujitsu.co.jp@uunet

tchrist@convex.COM (Tom Christiansen) (02/21/91)

From the keyboard of sleepy@wybbs.UUCP (Mike Faber):
:a simple shell invoked with arguements is easy enough...
:
:$ sh zgrep string file [file ...]
:
:zgrep:
:if [ $# -lt 2]
:then
:	echo usage:
:	exit 1
:fi
:
:vgrep=$1
:shift
:
:while [ $# -gt 0 ]
:do
:	zcat $i | grep $vgrep | awk fil=$1 '{printf("%s:%s\n",fil,$0)}'
:	shift
:done
:
:Now, was that so painful...

Apparently.  First, it's buggy (see below).  Second, it's not extensible:
you should make the "zcat" part vary, because someday someone will want to
use "pcat" instead of "zcat" or "nm" or "strings".  Third, grep's regexps
are weak; use egrep at the very least.  Fourth, it's slow.

As for the bugs:

    if [ $# -lt 2]
should be
    if [ $# -lt 2 ]

And this line:

    zcat $i | grep $vgrep | awk fil=$1 '{printf("%s:%s\n",fil,$0)}'

has three (3) bugs in it: 

    1) the $i should be $1
    2) The $vgrep should be "$vgrep" so it doesn't retokenize.
    3) Neither awk, gawk, nor nawk will swallow that file=$1 code.

The corrected line reads:

       zcat $1 | grep "$vgrep" | awk "{printf(\"$1:%s\n\",\$0)}"

Now let's consider timing.  I have 38 compressed files totalling around
80k in /usr/man/man1/a*.Z; watch:  (and yes, I used GNU grep.)

% time sh zgrep 'file* system' /usr/man/man1/a*.Z > /dev/null
2.1u 7.7s 0:14 67% 0+0k 28+0io 2245pf+0w

% time perl pipegrep 'file *system' zcat /usr/man/man1/a*.Z > /dev/null
1.1u 2.1s 0:05 66% 0+0k 29+0io 658pf+0w

'Nuff said?

The pipegrep program, as I mentioned earlier, is described in the
O'Reilly Camel Book on perl, and available via anon FTP inside of
nutshell/perl/perl.tar.Z on uunet.

--tom
-- 
"UNIX was not designed to stop you from doing stupid things, because
 that would also stop you from doing clever things" -- Doug Gwyn

 Tom Christiansen                tchrist@convex.com      convex!tchrist

bill@camco.Celestial.COM (Bill Campbell) (02/21/91)

>>It would be nice to grep through compressed files. Sure, you can do zcat |
>>grep regexp, but then you loose the ability of grep to tell you the
>>filename and/or linenumber of a match.  Uncompressing the files before
>>greping isn't really wanted, cause you may not have the space on disk.

>How quickly we forget!

A perl solution might well be:

while($file = shift) {
	$zcat = ($file =~ /\.Z$/ ? 'zcat' : $file =~ /\.z$/ ? 'pcat' : 'cat' );
	open(INPUT, "$zcat $file");
	while(<INPUT>) {
		print "$file:$_\n";
	}
	close(INPUT);
}
The one I actually use is:
#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  zgrep
# Wrapped by bill@camco on Wed Feb 20 23:34:27 1991
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'zgrep' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'zgrep'\"
else
echo shar: Extracting \"'zgrep'\" \(857 characters\)
sed "s/^X//" >'zgrep' <<'END_OF_FILE'
X:
X# Header: %Z% %M%	Version %I% %H% %T% %Z%
X# exec [e|f]grep on a series of .Z files
X
Xprogname=`basename $0`
XUSAGE="Usage: $progname [-v] [-e environment]"
Xset -- `getopt FEvxclhnbsye: $*`
Xif [ $? != 0 ]
Xthen
X	echo $USAGE
X	exit 2
Xfi
Xsuffix=$$
Xtmpenv=/tmp/tmpenv$suffix
X> $tmpenv
Xopts=''
Xfor i in $*
Xdo
X	case $i in
X		-F)	prefix=f
X			shift;;
X		-E)	prefix=e
X			shift;;
X		-[vxclhnbsy])
X			opts="$opts $1"
X			shift 1;;
X		-e)
X			opts="$opts -e $2"
X			shift 2 ;;
X		--)	shift;
X			break
X			;;
X	esac
Xdone
X. $tmpenv	# set any environment variables
Xrm $tmpenv
X
Xopts="$opts '$1'"
Xshift
X
Xif [ $# != 1 ]
X	then multi=true
X	else multi=''
Xfi
Xfor file do
X	case "$file" in
X		*.z)	cat=pcat;;
X		*.Z)	cat=zcat;;
X		*)		cat=cat;;
X	esac
X	if [ "$multi" != "" ]
X		then eval "$cat $file | ${prefix}grep $opts" | sed "s;^;$file:;"
X		else eval "$cat $file | ${prefix}grep $opts"
X	fi
Xdone
END_OF_FILE
if test 857 -ne `wc -c <'zgrep'`; then
    echo shar: \"'zgrep'\" unpacked with wrong size!
fi
chmod +x 'zgrep'
# end of 'zgrep'
fi
echo shar: End of shell archive.
exit 0
-- 
INTERNET:  bill@Celestial.COM   Bill Campbell; Celestial Software
UUCP:   ...!thebes!camco!bill   6641 East Mercer Way
             uunet!camco!bill   Mercer Island, WA 98040; (206) 947-5591

ronald@robobar.co.uk (Ronald S H Khoo) (02/21/91)

lucrezi@univaq.sublink.org (Gino Lucrezi) writes:

> In article <DANJ1.91Feb15162042@cbnewse.ATT.COM>, Dan_Jacobson@ATT.COM writes:
> > Following this trend you'll have a z<everything> ... sort of like
> > r<everything> (rsh, rcp ...).  

Sounds like a call for a filesystem type which supports compressed
files internally is required, + a new flag bit in st_mode to say
that a file should be compressed by the filesystem code if possible.

Of course, since compression shouldn't be in the kernel, your OS should
support User Mode filesystems ....

BSD 4.5 anyone ? :-)

--> Followup-To: comp.unix.wishful.thinking

> Why rule out some nifty programs such as zsh, which allows the user to type
> compressed commands (damn useful on those slow lines).

Naa, that's just Yet Another TELNET Option, which I would suggest only
to take effect if telnet's in line mode.

--> Followup-To: comp.protocols.tcp-ip

> And of course there are still zcc, zgetty or /dev/znull...

The latter gives me an idea, why not /dev/z/* as an excuse to put
compression into the kernel and really annoy one or two people in this
newsgroup ?
-- 
Ronald Khoo <ronald@robobar.co.uk> +44 81 991 1142 (O) +44 71 229 7741 (H)

meissner@osf.org (Michael Meissner) (02/22/91)

In article <1991Feb21.074647.6115@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes:

| lucrezi@univaq.sublink.org (Gino Lucrezi) writes:
| 
| > In article <DANJ1.91Feb15162042@cbnewse.ATT.COM>, Dan_Jacobson@ATT.COM writes:
| > > Following this trend you'll have a z<everything> ... sort of like
| > > r<everything> (rsh, rcp ...).  
| 
| Sounds like a call for a filesystem type which supports compressed
| files internally is required, + a new flag bit in st_mode to say
| that a file should be compressed by the filesystem code if possible.
| 
| Of course, since compression shouldn't be in the kernel, your OS should
| support User Mode filesystems ....

You can already do this by writing your own NFS server.  It's only a
SMOP.... (small matter of programming).
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Considering the flames and intolerance, shouldn't USENET be spelled ABUSENET?

saddler@bcstec.boeing.com (Ray Saddler) (02/23/91)

--- Following this trend you'll have a z<everything> ... sort of like
--- r<everything> (rsh, rcp ...).
--
--Why rule out some nifty programs such as zsh, ...
-
-You forgot zftp (just don't forget to set binary mode, or nothing will
-work... :-) )
-
-(Disclaimer:  If zftp happens to exist, ignore the above comment.)
-

I prefer to use zlogin, entering /usr/local/bin/zsh.  Then, EVERYTHING is run
in a compressed mode.  Saves on disk space, cpu time, and screen fill.  This is
available from the fine staff at Z University (zfolks@znode.z.zdu), and gives
you full source to ZNIX.

-- 
Ray E. Saddler III        saddler@bcstec.boeing.com  ___ ___ ___ ___     ___
CAD System/Network Admin   ..!uunet!bcstec!saddler  /__//  //__  /  /\ //  _
P.O. Box 3999  M.S. 3R-05    =-=-=-=-=-=-=-=-=-=   /__//__//__ _/_ /  //__/
Seattle, WA.  98124  U.S.A     +1 206 657 2824     Missile Systems Division

klaus@cnix.uucp (klaus u schallhorn) (02/27/91)

In article <708@bcstec.boeing.com> saddler@bcstec.boeing.com (Ray Saddler) writes:
>--- Following this trend you'll have a z<everything> ... sort of like
>--- r<everything> (rsh, rcp ...).
>--
>--Why rule out some nifty programs such as zsh, ...
>-
>-You forgot zftp (just don't forget to set binary mode, or nothing will
>-work... :-) )
>-
>-(Disclaimer:  If zftp happens to exist, ignore the above comment.)
>-
>
>I prefer to use zlogin, entering /usr/local/bin/zsh.  Then, EVERYTHING is run
>in a compressed mode.

Aren't you overlooking the risk of zburying your zkeyboard under 
a pile of conventional desk clutter? You'd never zfind that again. 
You'd be truly buggered. No 'Z'.

klaus schallhorn
-- 
George Orwell was an Optimist