[comp.unix.wizards] grep replacement

andrew@alice.UUCP (05/23/88)

	Al Aho and I are designing a replacement for grep, egrep and fgrep.
The question is what flags should it support and what kind of patterns
should it handle? (Assume the existence of flags to make it compatible
with grep, egrep and fgrep.)
	The proposed flags are the V9 flags:
-f file	pattern is (`cat file`)
-v	print nonmatching
-i	ignore aphabetic case
-n	print line number
-x	the pattern used is ^pattern$
-c	print count only
-l	print filenames only
-b	print block numbers
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-s	no output; just status
-e expr	use expr as the pattern

The patterns are as for egrep, supplemented by back-referencing
as in \{pattern\}\1.

please send your comments about flags or patterns to research!andrew

papowell@attila.uucp (Patrick Powell) (05/25/88)

In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>
>	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>The question is what flags should it support and what kind of patterns
>should it handle? (Assume the existence of flags to make it compatible
>with grep, egrep and fgrep.)
>
>please send your comments about flags or patterns to research!andrew

The one thing I miss about grep families is the ability to have
a named search pattern. For example:

DIGIT= \{[0-9]\}
ALPHA=\{[a-zA-Z]\}
\${ALPHA}\${PATTERN}

This would sort of make sense.

The other facility is to find multiple line patterns, as in:
find the pair of lines that have pattern1 in the first line
pattern2 in the second, etc.

This I have needed sooo many times;  I have ended up using AWK
and a clumsy set of searches.

For example:
\#{1 p}Pattern
\#{2}Pattern
This could print out lines that match,  or only the first line
(1p->print this one only).

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

ljz@fxgrp.UUCP (Lloyd Zusman) (05/26/88)

In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
  In article <7882@alice.UUCP> andrew@alice.UUCP writes:
  >
  >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
  >The question is what flags should it support and what kind of patterns
  >should it handle? ...

  ...

  The other facility is to find multiple line patterns, as in:
  find the pair of lines that have pattern1 in the first line
  pattern2 in the second, etc.

  This I have needed sooo many times;  I have ended up using AWK
  and a clumsy set of searches.

  For example:
  \#{1 p}Pattern
  \#{2}Pattern
  This could print out lines that match,  or only the first line
  (1p->print this one only).

  ...

Or another way to get this functionality would be for this new greplike
thing to allow matches on the newline character.  For example:

    ^.*foo\nbar.*$
          ^^
    	newline

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) (05/26/88)

One thing I would _love_ is to be able to find the context of what I've
found, for example, to find the two (n?) surrounding lines.  I have wanted
to do this many times and there is no good way.

	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
			elroy!alan@csvax.caltech.edu	   could go wrong?"

ben@idsnh.UUCP (Ben Smith) (05/26/88)

I also would like to see more of the lex capabilities in grep.
-- 
Integrated Decision Systems, Inc.    | Benjamin Smith - East Coast Tech. Office
The fitting solution in professional | Peterborough, NH
portfolio management software.       | UUCP: uunet!idsnh!ben

kutz@bgsuvax.UUCP (Kenneth Kutz) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

There is a program on the Usenix tape under .../Utilities/Telephone
called 'tele'.  If you call the program using the name 'g', it
supports displaying of context.  E-mail me if you want more info.

-- 
--------------------------------------------------------------------
      Kenneth J. Kutz         	CSNET kutz@bgsu.edu
				UUCP  ...!osu-cis!bgsuvax!kutz
 Disclaimer: Opinions expressed are my own and not of my employer's
--------------------------------------------------------------------

dcon@ihlpe.ATT.COM (452is-Connet) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon

david@elroy.Jpl.Nasa.Gov (David Robinson) (05/27/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
 
> Also, what line number it was found on.
 


How about "grep -n"?



-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

daveb@laidbak.UUCP (Dave Burton) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
|Also, what line number it was found on.

Already there: grep -n.

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

Please. Maybe "grep -k" where k is any integer giving the number of lines
of context on each side of grep, default is 0. Oh, but hey, _you're_ designing
it! :-)
-- 
--------------------"Well, it looked good when I wrote it"---------------------
 Verbal: Dave Burton                        Net: ...!ihnp4!laidbak!daveb
 V-MAIL: (312) 505-9100 x325            USSnail: 1901 N. Naper Blvd.
#include <disclaimer.h>                          Naperville, IL  60540

dcon@ihlpe.ATT.COM (452is-Connet) (05/27/88)

In article <6877@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
>> Also, what line number it was found on.
>How about "grep -n"?
>


Embarassed and red-faced he goes away to read the man-page...

stan@sdba.UUCP (Stan Brown) (05/27/88)

> 
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
> 
> 	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
> 			elroy!alan@csvax.caltech.edu	   could go wrong?"


	Along this same general line it would be nice to be abble to
	look for paterns that span lines.  But perhaps this would be
	tom complete a change in the philosophy of grep ?

	stan


-- 
Stan Brown	S. D. Brown & Associates	404-292-9497
(uunet gatech)!sdba!stan				"vi forever"

jas@rain.rtech.UUCP (Jim Shankland) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
>In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines....
>
>Also, what line number it was found on.

You've already got the line number with the "-n" option.  Note that that makes
it easy to write a little wrapper script that gives you context grep.
Whether that's preferable to adding the context option to grep is, I suppose,
debatable; but I can already see the USENIX paper:

	"newgrep -[whatever] Considered Harmful"

Jim Shankland
  ..!ihnp4!cpsc6a!\
               sun!rtech!jas
 ..!ucbvax!mtxinu!/

aperez@cvbnet2.UUCP (Arturo Perez Ext.) (05/27/88)

From article <662@fxgrp.UUCP>, by ljz@fxgrp.UUCP (Lloyd Zusman):
> In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
>   In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>   >
>   >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   >The question is what flags should it support and what kind of patterns
>   >should it handle? ...

Actually, I agree with the guy who posted a request shortly before this
came out.

The most useful feature that is currently lacking is the ability to
do context greps, i.e. greps with a window.  There are two ways this could be
handled.   One is to allow awk-like constructs specifying beginning and 
ending points for a window.  Sort of like, e.g.

	grep -w '/:/,/^$/' file

which would find the lines between each pair of a ':' containing line and
the next following blank line.  The other way would be to have a simple
"number of lines around match" parameter, possibly with collapse of overlapping
windows.  Then you could say

	grep -w 5 foo file

which would print 2 lines above and below the matching line.  Either way
it's done would be nice.  I have made one attempt to implement this
with a script and it wasn't too much fun...

Arturo Perez
ComputerVision, a division of Prime

bzs@bu-cs.BU.EDU (Barry Shein) (05/28/88)

Re: grep with N context lines shown...

Interesting, that's very close to a concept of a multi-line record
grep where I treat N lines as one and any occurrance results in a
listing. The difference is the line chosen to count from (in a context
the match would probably be middle and +-N, in a record you'd just
list the record.)

Just wondering if a generalization is being missed here somewhere,
also consider grepping something like a termcap file, maybe what I
really want is a generalized method to supply pattern matchers for
what to list on a hit:

	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
					# beginning with white space to
					# one past current line

Of course, that destroys the stream nature of grep, it has to be able
to arbitrarily back up, ugh, although "last candidate for a start"
could be saved on the fly. The nice thing is that it can use
(essentially) the same pattern machinery for choosing printing (I
know, have to add in the notion of dot etc.)

I dunno, food for thought, like I said, maybe there's a generalization
here somewhere. Or maybe grep should just emit line numbers in a form
which could be post-processed by sed for fancier output (grep in
backquotes on sed line.) Therefore none of this is necessary :-)

	-Barry Shein, Boston University

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/28/88)

[mail bounced]

There have been times when I wanted a grep that would print out the
first occurrence and then stop.

-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

rbj@icst-cmr.arpa (Root Boy Jim) (05/28/88)

	   Al Aho and I are designing a replacement for grep, egrep and fgrep.
   The question is what flags should it support and what kind of patterns

I have always thought it would be nice to print only the first match.

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

wyatt@cfa.harvard.EDU (Bill Wyatt) (05/28/88)

> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

grep '(your_pattern_here)' | head -1
-- 

Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
Wyatt   ARPA:  wyatt@cfa.harvard.edu
         (or)  wyatt%cfa@harvard.harvard.edu
      BITNET:  wyatt@cfa2
        SPAN:  cfairt::wyatt

dieter@nmtsun.nmt.edu (Dieter Muller) (05/28/88)

In article <22969@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>
 [introductory comments deleted]
>Just wondering if a generalization is being missed here somewhere,
>also consider grepping something like a termcap file, maybe what I
>really want is a generalized method to supply pattern matchers for
>what to list on a hit:
>
>	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
>	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
>					# beginning with white space to
>					# one past current line
>
 [various drawbacks deleted]
>	-Barry Shein, oston University

Many's the time I would have been willing to make a blood sacrifice for
this kind of capability.  Firing up emacs for /etc/termcap can be a real
pain, when you're A) on a low-speed terminal line (300/1200 baud), B)
looking for something near the end of the file, and C) many things between
the beginning of the file and what you want will match.  Even using
gnumacs in batch mode & writing some lisp to do it strikes me as inelegant.
Starting a 32K search program doesn't hurt nearly as much as starting up
a 1253K search program.

Also, I always use egrep instead of grep, since it is almost always faster.
I don't understand how it is also faster than fgrep, but that's what "time"
says.  Please consider this when picking algorithms.

Dieter (Gnumacs is nice, egrep is better) Muller
-- 
You want coherency, cogency, and literacy all in one posting?  Be real.
...{cmcl2, ihnp4}!lanl!unm-la!unmvax!nmtsun!dieter
dieter@nmtsun.nmt.edu

ado@elsie.UUCP (Arthur David Olson) (05/28/88)

> > There have been times when I wanted a grep that would print out the
> > first occurrence and then stop.
> 
> grep '(your_pattern_here)' | head -1

Doesn't cut it for

	grep '(your_pattern_here)' firstfile secondfile thirdfile ...
-- 
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

roy@phri.UUCP (Roy Smith) (05/28/88)

wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[as a way to get just the first occurance of pattern]
> grep '(your_pattern_here)' | head -1

	Yes, it'll certainly work, but I think it bypasses the original
intention; to save CPU time.  If I had a 1000 line file with pattern on
line 7, I want grep to read the first 7 lines, print out line 7, and exit.
grep|head, on the other hand, will read and search all 1000 lines of the
file; it won't exit (with a EPIPE) until it writes another line to stdout
and finds that head has already exited.  In fact, if grep block-buffers its
output, it may never do more than a single write(2) and never notice that
head has exited.

	Anyway, I agree with the "find first match" flag being a good idea.
It would certainly speed up things like

	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

where I know that the pattern is going to be matched in the first few lines
and don't want to bother searching the rest of the multi-killoline file.
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

chip@vector.UUCP (Chip Rosenthal) (05/29/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> grep '(your_pattern_here)' | head -1
>Doesn't cut it for
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

nor if you want to see if a match was found by testing the exit status
-- 
Chip Rosenthal /// chip@vector.UUCP /// Dallas Semiconductor /// 214-450-0400
{uunet!warble,sun!texsun!rpp386,killer}!vector!chip
I won't sing for politicians.  Ain't singing for Spuds.  This note's for you.

russ@groucho.ucar.edu (Russ Rew) (05/30/88)

I also recently had a need for printing multi-line "records" in which a
specified pattern appeared somewhere in the record.  The following
short csh script uses the awk capability to treat whole lines as fields
and empty lines as record separators to print all the records from
standard input that contain a line matching a regular specified as an
argument:

#!/bin/csh -f
awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '


     Russ Rew * UCAR (University Corp. for Atmospheric Research)
	 PO Box 3000 * Boulder, CO  80307-3000 * 303-497-8845
	     russ@unidata.ucar.edu * ...!hao!unidata!russ

rcodi@yabbie.rmit.oz (Ian Donaldson) (05/30/88)

From article <3324@phri.UUCP>, by roy@phri.UUCP (Roy Smith):
> It would certainly speed up things like
> 
> 	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

> where I know that the pattern is going to be matched in the first few lines
> and don't want to bother searching the rest of the multi-killoline file.

A simple permutation:

 	head -60 /usr/spool/news/comp/sources/unix/* | grep "^Subject: "

works fairly close to the mark, and doesn't waste much time at all.

Ian D

frei@rubmez.UUCP (Matthias Frei ) (05/30/88)

> 	Al Aho and I are designing a replacement for grep, egrep and fgrep.
> The question is what flags should it support and what kind of patterns
> should it handle? (Assume the existence of flags to make it compatible
> with grep, egrep and fgrep.)

Hi,
some applications need to divert a file in two parts.
One should contain all lines matching any patterns, the other
one all lines not matching any of the patterns.
So I want following flags:

	- d	divert the file
		"matches" to stdout
		"nomatches" to stderr
	-r	exchange stdout and stderr, if -d is given  

Will you post Your new grep to the net ? (I hope so)

Thanks in Advance for a nice new tool

	Matthias Frei

--------------------------------------------------------------------
Snail-mail:                    |  E-Mail address:
Microelectronics Center        |                 UUCP  frei@rubmez.uucp        
University of Bochum           |                (...uunet!unido!rubmez!frei)
4630 Bochum 1, P.O.-Box 102143 |
West Germany                   |

joey@tessi.UUCP (Joe Pruett) (05/31/88)

>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

This works, but is quite slow if the input to grep is large.  A hack
I've made to egrep is a switch of the form -<number>.  This causes only
the first <number> matches to be printed, and then the next file is
searched.  This is great for:

egrep -1 ^Subject *

in a news directory to get a list of Subject lines.

jqj@uoregon.uoregon.edu (JQ Johnson) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>grep '(your_pattern_here)' | head -1
This is, of course, unacceptable if you are searching a very long file
(say, a census database) and have LOTS of pipe buffering.

Too bad it isn't feasible to have a shell that can optimize pipelines.

dan@maccs.UUCP (Dan Trottier) (05/31/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> > There have been times when I wanted a grep that would print out the
>> > first occurrence and then stop.
>> 
>> grep '(your_pattern_here)' | head -1
>
>Doesn't cut it for
>
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

This is getting ridiculous and can be taken to just about any level...

	foreach i (file1 file2 ...)
	   grep 'pattern' $i | head -1
	end

-- 
       A.I. - is a three toed sloth!        | ...!uunet!mnetor!maccs!dan
-- Official scrabble players dictionary --  | dan@mcmaster.BITNET

leo@philmds.UUCP (Leo de Wit) (05/31/88)

In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes:
>I also recently had a need for printing multi-line "records" in which a
>specified pattern appeared somewhere in the record.  The following
>short csh script uses the awk capability to treat whole lines as fields
>and empty lines as record separators to print all the records from
>standard input that contain a line matching a regular specified as an
>argument:
>
>#!/bin/csh -f
>awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '
>
>

Awk is a nice solution, but sed is a much faster one. I've been following 
the 'grep' discussion for some time now, and have seen much demand for
features that are simply within sed. Here are some; I have left the discussion
about the function of this or that sed-command out: there is a sed article and
a man page...

Patrick Powell writes:
>The other facility is to find multiple line patterns, as in:
>find the pair of lines that have pattern1 in the first line
>pattern2 in the second, etc.

Try this one:

        sed -n -e '/PATTERN1/,/PATTERN2/p' file

It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can
have subcommands to do special things (with '{' I mean).

Alan (..!cit-vax!elroy!alan) writes:
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

There is. Try this one:

        sed -n -e '
/PATTERN/{
x
p
x
p
n
p
}
h' file

It prints the line before, the line containing the PATTERN, and the line after.
Of course you can make the output fancier and the number of lines printed
larger.

David Connet writes:
>>
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines.  I have wanted
>>to do this many times and there is no good way.
>Also, what line number it was found on.

Sed can also handle this one:

        sed -n -e '/PATTERN/=' file

Lloyd Zusman writes:
>Or another way to get this functionality would be for this new greplike
>thing to allow matches on the newline character.  For example:
>    ^.*foo\nbar.*$
>          ^^
>    	newline

Sed can match on embedded newline characters in the substitute command 
(it is indeed \n here!). The trailing newline is matched by $.

Barry Shein writes [story about relative addressing]:
>I dunno, food for thought, like I said, maybe there's a generalization
>here somewhere. Or maybe grep should just emit line numbers in a form
>which could be post-processed by sed for fancier output (grep in
>backquotes on sed line.) Therefore none of this is necessary :-)

Quite right. I think most times you want to see the context it is in 
interactive use. In that case you can write a simple sed-script that does
just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines
, where N is a constant. The example I gave for N == 1 can be extended for
larger N, with fancy output etc.

Bill Wyatt writes: 
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

Much simpler, and faster:

        sed -n -e '/PATTERN/{
p
q
}' file

Sed quits immediately after finding the first match. You could even create an 
alias for something like that.

Michael Morrell writes:
>>Also, what line number it was found on.
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

The sed trick does this:

        sed -n -e '/PATTERN/=' file

Or you could even:

        sed -n -e '/PATTERN/{
=
q
}' file

which prints the first matched line number and exits.

Roy Smith writes:
>wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>[as a way to get just the first occurance of pattern]
>> grep '(your_pattern_here)' | head -1
>	Yes, it'll certainly work, but I think it bypasses the original
>intention; to save CPU time.  If I had a 1000 line file with pattern on
>line 7, I want grep to read the first 7 lines, print out line 7, and exit.
>grep|head, on the other hand, will read and search all 1000 lines of the
>file; it won't exit (with a EPIPE) until it writes another line to stdout
>and finds that head has already exited.  In fact, if grep block-buffers its
>output, it may never do more than a single write(2) and never notice that
>head has exited.

Quite right. The sed-solution I mentioned before is fast and neat. In fact, 
who needs head:

        sed 10q

does the job, as you can find in a book of Kernigan and Pike, I thought the 
title was 'the Unix Programming Environment'.

Stan Brown writes:
>	Along this same general line it would be nice to be abble to
>	look for paterns that span lines.  But perhaps this would be
>	tom complete a change in the philosophy of grep ?

As I mentioned before, embedded newlines can be matched by sed in the
substitute command.

What I also see often is things like

        grep 'pattern' file | sed 'expression'

A pity a lot of people don't know that sed can do the pattern matching itself.

S. E. D. (Sic Erat Demonstrandum)

As far as options for a new grep are conceirned, I suggest to use the options
proposed (and no more). Let other tools handle other problems - that's in the
Un*x spirit. What I would appreciate most in a new grep is:
no more grep, egrep, fgrep, just one tool that can be both fast (for fixed
strings) and elaborate (for pattern matching like egrep). The 'bm' tool that
was on the net (author Peter Bain) is very fast for fixed strings, using the
Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...?

        Leo.

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
|
|> There have been times when I wanted a grep that would print out the
|> first occurrence and then stop.
|
|grep '(your_pattern_here)' | head -1

Yes I have tried that. You are missing the point.

Have you ever waited for a computer?  

There are times when I want the first occurrence of a pattern without
reading the entire (i.e. HUGE) file.

Or there are times when I want the first occurrence of a pattern from
hundreds of files, but I don't want to see the pattern more than once.

And yes I know how to write a shell script that does this.

IMHO (sarcasm mode on), it is more efficient to call grep 
once for one hundred files, than to call (grep $* /dev/null|head -1) 
one hundred times. 
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

gwc@root.co.uk (Geoff Clare) (05/31/88)

Most of the useful things people have been saying they would like to be
able to do with 'grep' can already be done very simply with 'sed'.
For example:

    Stop after first match:   sed -n '/pattern/{p;q;}'

    Match over two lines:     sed -n 'N;/pat1\npat2/p;D'

It should also be possible to get a small number of context lines by
judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
tried it.  Anyway, this can be done with a normal line editor (if the data
to be searched aren't coming from a pipe) with 'g/pattern/-,+p'.

I was rather alarmed to see the proposal for 'pattern repeat' in the original
article was '\{pattern\}\1' rather than '\(pattern\)\1', as the latter is
already used for this purpose in the standard editors (ed, ex/vi, sed).
Or was it a typo?

By the way, does anyone know why the ';' command terminator in 'sed' is
not documented?  It works on all the systems I've tried it on, but I
have never found it in any manuals.  It's so much nicer than putting
the commands on separate lines, or using multiple '-e' options.
-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-606-7799  FAX: +44-1-726-2750

andyc@omepd (T. Andrew Crump) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:

>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.

   >
   >grep '(your_pattern_here)' | head -1

Yes, but it forces grep to search a whole file, when what you may have wanted
was at the beginning.  This is inefficient if the "file" is large.

A more general version of this request would be a parameter that would restrict
grep to n or less occurrences, maybe 'grep -N #'.

-- Andy Crump

rbj@icst-cmr.arpa (Root Boy Jim) (05/31/88)

   From: Bill Wyatt <wyatt@cfa.harvard.edu>

   > There have been times when I wanted a grep that would print out the
   > first occurrence and then stop.

   grep '(your_pattern_here)' | head -1

Well, that *prints* what I want to see, but takes longer than I want to
wait. I want it to quit looking in the file. Besides, there is no way
you can do `grep -1 pattern *.[ch]' as trivially.

   Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
   Wyatt   ARPA:  wyatt@cfa.harvard.edu
	    (or)  wyatt%cfa@harvard.harvard.edu
	 BITNET:  wyatt@cfa2
	   SPAN:  cfairt::wyatt 

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

glennr@holin.ATT.COM (Glenn Robitaille) (06/01/88)

> > > There have been times when I wanted a grep that would print out the
> > > first occurrence and then stop.
> > 
> > grep '(your_pattern_here)' | head -1
> 
> Doesn't cut it for
> 
> 	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

Well, if you have a shell command like

	#
	# save the search patern
	#
	patern=$1
	#
	# remove search patern from $*
	#
	shift
	for i in $*
	do
		#
		# grep for search patern
		#
		line=`grep ${patern} ${i}|head -1`
		#
		# if found, print file name and string
		#
		test -n "$line" && echo "${i}:\t${line}"
	done

It'll work fine.  If you want to use other options, have them in
quotes as part of the first argument.


Glenn Robitaille
AT&T, HO 2J-207
ihnp4!holin!glennr
Phone (201) 949-7811

aeb@cwi.nl (Andries Brouwer) (06/01/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

A fast way of searching for the first occurrence is really useful.
I have a version of grep called `contains', and a shell script
for formatting that says: if the input contains .[ then use refer;
if it contains .IS then ideal; if .PS then pic; if .TS then tbl, etc.

-- 
      Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl

booter@deimos.ads.com (Elaine Richards) (06/01/88)

In article <15030@brl-adm.ARPA> rbj@icst-cmr.arpa (Root Boy Jim) writes:
>	   Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   The question is what flags should it support and what kind of patterns
>I have always thought it would be nice to print only the first match.


grep string filename |head -1

Sorry, I could not resist. Why not do an alias instead?

ER

hasch@gypsy.siemens-rtl (Harald Schaefer) (06/01/88)

If you are only interested in the first occurence of a pattern, you can use
something like
	sed -n '/<pattern>/ {
		p
		q
		}' file
Harald Schaefer
Siemens Corp. - RTL
Bus. Phone (609) 734 3389
Home Phone (609) 275 1356

uucp:	...!princeton!gypsy!hasch
	hasch@gypsy.uucp
ARPA:	hasch@siemems.com
	hasch%siemens@princeton.EDU

aburt@isis.UUCP (Andrew Burt) (06/01/88)

I'd like to see the following enhancements in a grepper:

	-  \< and \> to match word start/end as in vi, with -w option
		as in BSD grep to match pattern as a word.

	- \w in pattern to match whitespace (generalization: define
		\unused-letter as a pattern; or allow full lex capability).

	- way to invert piece of pattern such as: grep foo.*\^bar\^xyzzy
		with meaning as in: grep foo | grep -v bar | grep -v xyzzy
		(or could be written grep foo.*\^(bar|xyzzy) of course).

	-  Select Nth occurrence of match (generalization: list of
		matches to show: grep -N -2,5-7,10- ... to grab up to the 2nd,
		5th through 7th, and from the 10th onward).

	- option to show lines between matches (not just matching lines)
		as in: grep -from foo -to bar ... meaning akin to
		sed/ed's /foo/,/bar/p.  (But much more useful with other
		extensions).

	- Allow matching newlines in a "binary" (or non-text) sort of mode:
		grep -B 'foo.*bar'  finds foo...bar even if they are
		not on the same line.  (But printing the "line" that
		matches wouldn't be useful anymore, so just printing the
		matched text would be better.  Someone wanting lines could
		look for \n[^\n]*foo.*bar[^\n]*\n, though a syntax to
		make this easier might be in order.  Perhaps this wouldn't
		be an example of a binary case -- but a new character
		with meaning like '.' but matching ANY character would work:
		if @ is such a character then "grep foo@*bar".   Perhaps
		a better example, assuming the \^ for inversion syntax
		above would be "grep foo@*(\^bar)bar -- otherwise it would
		match from first foo to last bar, while I might want from
		first foo to first bar.)

	- provide byte offset of start of match (like block number or
		line number) useful for searching non-text files.

	- Provide a lib func that has the RE code in it.

	- Install RE code in other programs: awk/sed/ed/vi etc.
		Oh for a standardized RE algorithm!
-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

jjg@linus.UUCP (Jeff Glass) (06/01/88)

In article <470@q7.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
> >grep '(your_pattern_here)' | head -1
> 
> This works, but is quite slow if the input to grep is large.  A hack
> I've made to egrep is a switch of the form -<number>.  This causes only
> the first <number> matches to be printed, and then the next file is
> searched.  This is great for:
> 
> egrep -1 ^Subject *
> 
> in a news directory to get a list of Subject lines.

Try:

	sed -n -e '/pattern/{' -e p -e q -e '}' filename

This prints the first occurrence of the pattern and then stops searching
the file.  The generalizations for printing the first <n> matches and
searching <m> files (where n,m > 1) are more awkward (no pun intended)
but are possible.

/jeff

brianm@sco.COM (Brian Moffet) (06/01/88)

In article <4537@vdsvax.steinmetz.ge.com> barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|grep '(your_pattern_here)' | head -1
>
>Or there are times when I want the first occurrence of a pattern from
>hundreds of files, but I don't want to see the pattern more than once.
>

Have you tried sed?  How about 

$ sed -n '/pattern/p;/pattern/q' file

???



-- 
Brian Moffet		brianm@sco.com  {uunet,decvax!microsof}!sco!brianm
The opinions expressed are not quite clear and have no relation to my employer.
'Evil Geniuses for a Better Tommorrow!'

anw@nott-cs.UUCP (06/01/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer)
writes:
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

	See below.  Does n == 4, but easily changed.

In article <590@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.

	Which is not to say that they shouldn't also be in "*grep"!

>	[ good examples omitted ]
>
> It should also be possible to get a small number of context lines by
> judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
> tried it.  [ ... ]

	The following is "/usr/bin/kwic" on this machine (PDP 11/44 running
V7).  I wrote it about three years ago in response to a challenge from some
AWK zealots;  it runs *much* faster than the equivalent AWK script.  That
is, it is sloooww rather than ssllloooooowwww.  I have a manual entry for
it which is too trivial to send.  Bourne shell, of course.  Use at whim
and discretion.  Several minor bugs, mainly (I hope!) limitations of or
between "sh" and "sed".  (Note that the various occurrences of multiple
spaces in "s..." commands are all TABs, in case mailers/editors/typists
have mangled things.)

> By the way, does anyone know why the ';' command terminator in 'sed' is
> not documented?  It works on all the systems I've tried it on, but I
> have never found it in any manuals.  It's so much nicer than putting
> the commands on separate lines, or using multiple '-e' options.

	No, I don't know why, but it isn't the only example in Unix of a
facility most easily discovered by looking in the source.  I've occasionally
used it, but I tried re-writing the following that way, and it *didn't* look
so much nicer;  in fact it looked 'orrible.

--------------------------------- [cut here] -----------------------------
[ $# -eq 0 ] && { echo "Usage: $0 pattern [file] ..." 1>&2; exit 1; }

l='[^\n]*\n' pat="$1" shift

exec sed -n   "/$pat"'/ b found
			s/^/	/
			H
			g
      /^'"$l$l$l$l$l"'/ s/\n[^\n]*//
			h
			b
	: found
			s/^/++	/
			H
			g
			s/.//p
			s/.*//
			h
	: loop
		      $ b out
			n
	     /'"$pat"'/ b found
			s/^/	/
			H
			g
	/^'"$l$l$l$l"'/ !b loop
	: out
			s/.//p
			s/.*/-----------------/
			h
	    ' ${1+"$@"}

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

andrew@alice.UUCP (06/01/88)

in my naivity, i had not been following netnews closely
after i posted the original ``grep replacement'' article.
I assumed that people would reply to me, not the net.
That is the reason i have not been participating in the discussion.
i will be posting my resolution of the suggestions shortly.

many people have written about patterns matching multiple lines.
grep will not do this. if you really need this, use sam by rob pike
as described in the nov 1987 software practice and experience.
the code is available for a plausible fee from the at&t toolchest.

jfh@rpp386.UUCP (John F. Haugh II) (06/02/88)

In article <2117@uoregon.uoregon.edu> jqj@drizzle.UUCP (JQ Johnson) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>>> There have been times when I wanted a grep that would print out the
>>> first occurrence and then stop.
>>grep '(your_pattern_here)' | head -1
>This is, of course, unacceptable if you are searching a very long file
>(say, a census database) and have LOTS of pipe buffering.
>
>Too bad it isn't feasible to have a shell that can optimize pipelines.

there is a boyer/moore based fast grep in the archives.  adding an
additional option (say '-f' for first in each file?) should be quite
simple.

perhaps i'll post the diff's if i remember to go hack on the sucker
any time soon.

- joh.
-- 
John F. Haugh II                 | "If you aren't part of the solution,
River Parishes Programming       |  you are part of the precipitate."
UUCP:   ihnp4!killer!rpp386!jfh  | 		-- long since forgot who
DOMAIN: jfh@rpp386.uucp          |

kent@happym.UUCP (Kent Forschmiedt) (06/02/88)

In article <136@rubmez.UUCP> frei@rubmez.UUCP (Matthias Frei ) writes:
>I want following flags:
>
>	- d	divert the file
>		"matches" to stdout
>		"nomatches" to stderr
>	-r	exchange stdout and stderr, if -d is given  

I second the vote - just today I did one of these:

grep $PATTERN file > afile
grep -v $PATTERN file > anotherfile

Note, however, that -v will serve for the suggested -r.

>Will you post Your new grep to the net ? (I hope so)

From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
Unix, and none of us humans will see it until sysVr6, and only then 
if we are lucky!! 
-- 
--
	Kent Forschmiedt -- kent@happym.UUCP, tikal!camco!happym!kent
	Happy Man Corporation  206-282-9598

john@frog.UUCP (John Woods) (06/02/88)

In article <590@root44.co.uk>, gwc@root.co.uk (Geoff Clare) writes:
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.
> For example:
>     Stop after first match:   sed -n '/pattern/{p;q;}'

Close, but no cigar.  It does not work for multiple input files.
(And, of course, spawning off a new sed for each file defeats the basic desire
of most of the people who've asked for it:  speed)

However,

	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *

does (just about) work.  And it's probably not _obscenely_ slow.
(it doesn't behave for no input files, and you might prefer no FILENAME: for
just a single input file)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

peter@ficc.UUCP (Peter da Silva) (06/02/88)

In article <136@rubmez.UUCP>, frei@rubmez.UUCP (Matthias Frei ) writes:
> So I want following flags:

> 	- d	divert the file "matches" to stdout "nomatches" to stderr

Good, but...

> 	-r	exchange stdout and stderr, if -d is given  

Shouldn't this case (-r) be handled by the existing '-v' flag?
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: uunet!nuchat!sugar!peter.

mdorion@cmtl01.UUCP (Mario Dorion) (06/03/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
> >
> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
> 
> Also, what line number it was found on.
> 
> David Connet
> ihnp4!ihlpe!dcon

Ever tried grep -n ?????

There are three features I would like to see in a grep-like program:

1- Be able to use a newline character in the regular expression
       grep 'this\nthat' file 

2- Be able to grep more than one regular expression with one call. This would
   be faster than issuing many calls since the file would be read only once.

3- To have an option to search only for the first occurence of the pattern.
   Sometimes you KNOW that the pattern is there only once (for example if you
   grep '^Subject:' on news files) and there's just no need to scan the rest of
   the file. When 'grepping' into many files it would return the first occurence
   for each file.

-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

andrew@alice.UUCP (06/03/88)

In article <449@happym.UUCP>, kent@happym.UUCP writes:
> From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
> Unix, and none of us humans will see it until sysVr6, and only then 
> if we are lucky!! 


Context:
	the right thing to do is to write a context program that takes
input looking like "filename:linenumber:goo" and prints whatever context you like.
we can then take this crap out of grep and diff and make it generally available
for use with programs like the C compiler and eqn and so on. It can also do
the right thing with folding together nearby lines. At least one good first
cut has been put on the net but a C program sounds easy enough to do.

Source:
	the software i write is publicly available because it matters to me.
it was a hassle but mk and fio are available to everybody for reasonable cost
(< $125 commercial, nearly free educational). i am trying hard to do the
same for the new grep. it will be in V10, it will be in plan9, and should be
in SVR4 (the joint sun-at&t release).

allbery@ncoast.UUCP (Brandon S. Allbery) (06/05/88)

As quoted from <2312@bgsuvax.UUCP> by kutz@bgsuvax.UUCP (Kenneth Kutz):
+---------------
| In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
| > One thing I would _love_ is to be able to find the context of what I've
| > found, for example, to find the two (n?) surrounding lines.  I have wanted
| > to do this many times and there is no good way.
+---------------

	grep -n foo ./bar | context 2

I posted context to net.sources back when it existed; someone may still have
archives from that time, if not I'll retrieve my sources and repost it.  It
takes lines of the basic form

	filename ... linenumber : ...

and displays context around the specified lines.  I use this with grep quite
often; it also works with cc (pcc, not Xenix cc) error messages.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

gwyn@brl-smoke.UUCP (06/05/88)

In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>	the right thing to do is to write a context program that takes
>input looking like "filename:linenumber:goo" and prints whatever context ...

Heavens -- a tool user.  I thought that only Neanderthals were still alive.
I guess Bell Labs escaped the plague.

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|
>|> There have been times when I wanted a grep that would print out the
>|> first occurrence and then stop.
>|
>|grep '(your_pattern_here)' | head -1
>
[...]
>
>Have you ever waited for a computer?  

No, never. :-)

>There are times when I want the first occurrence of a pattern without
>reading the entire (i.e. HUGE) file.

I realize this is dependent on the way in which processes sharing a
pipe act, but this is a point worth considering before we get yet
another annoying burst of "cat -v" type programs.

grep pattern file1 ... fileN | head -1

This should send grep a SIGPIPE as soon as the first line of output
trickles through the pipe.  This would result in relatively little
of the file actually being read under most Unix implementations.
I would agree that it is a bad thing to rely on the granularity of
a pipe.  Here is a sample program which can be used to show you what
I mean.

Name it grep, and use it thus wise:

% ./grep pattern * | head -1

/* ------------- Cut here --------------- */
#include <stdio.h>
#include <signal.h>

sighandler(sig)
    int sig;
{
    if (sig == SIGPIPE)
	fprintf(stderr,"Died from a SIGPIPE\n");
    else
	fprintf(stderr,"Died from signal #%d\n", sig);
    exit(0);
}

main()
{
    signal(SIGPIPE,sighandler);
    for (;;)
	printf("pattern\n");
}
/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

I can think of a few nasty ways to do this one, I am hoping to get
a better answer.

A grep with a window of context around it.  A few lines proceeding and
following the pattern I am looking for.  The VMS search command sported
this as an option/qualifier.  I miss it sometimes (not VMS, just a few
of the more wacky utilities, like the editor option for creation of
multi-key data base files :-).

/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

tbray@watsol.waterloo.edu (Tim Bray) (06/05/88)

Grep should, where reasonable, not be bound by the notion of a 'line'.
As a concrete expression of this, the useful grep -l (prints the names of
the files that contain the string) should work on any kind of file.  More
than one existing 'grep -l' will fail, for example, to tell you which of a 
bunch of .o files contain a given string.  Scenario - you're trying to
link 55 .o's together to build a program you don't know that well.  You're
on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
The source is scattered inconveniently.  The obvious thing to do is: 
grep -l _memcpy *.o
That this often will not work is irritating.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Heavens -- a tool user.  I thought that only Neanderthals were still alive.
>I guess Bell Labs escaped the plague.

Almost, unless the original input was produced by a pipeline, in which
case this (putative) post-processor can't help unless you tee the mess
to a temp file, yup, mess is the right word.

Or maybe only us Neanderthals are interested in tools which work on
pipes? Have they gone out of style?

	-Barry "Ulak of Org" Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

The proposed tool would be very handy on ordinary text files,
but it is hard to see a use for it on pipes.  Or, getting back
to context-grep, what good would it do to show context from a
pipe?  To do anything with the information (other than stare
at it), you'd need to produce it again.  There might be some
use for context-{grep,diff,...} on a stream, but if a separate
context tool will satisfy 99% of the need, as I think it would,
as well as provide this capability for other commands "for free",
it would be a better approach than hacking context into other
commands.

By the way, I hope the new grep when asked to always produce
the filename will use "-" for stdin's name, and the context
tool would also follow the same convention.  Even though the
Research systems have /dev/stdin, other sites may not, and
anyway (as we've just seen) stdin isn't really a definite
object.

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

How about:

alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

or something like that?  Does that offend tool-users sensibilities?
*Do* Neanderthals have any sensibilities?
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>The proposed tool would be very handy on ordinary text files,
>but it is hard to see a use for it on pipes.  Or, getting back
>to context-grep, what good would it do to show context from a
>pipe?  To do anything with the information (other than stare
>at it), you'd need to produce it again.

What else are context displays for except to stare at (or save in a
file for later staring)?

Are the resultant contexts often the input to other programs? (I know
that 'patch' can take a context input but that's irrelevant, it hardly
needs nor prefers a context diff to my knowledge, it's just being
accomodating so humans can look at the context diff if something
botches.)

Actually, I can answer that in the context of the original suggestion.

The motivation for a context comes in two major flavors:

	A) To stare at (the surrounding context gives a human some
	hint of the context in which the text appeared)

	B) Because the context really represents a multi-line (eg)
	record, such as pulling out every termcap or terminfo entry
	which contains some property but desiring the result to contain
	the entire multiline entry so it could be re-used to create a
	new file.

In either case it's independent of whether the data is coming from a
pipe (as it should be.) Its pipeness may be caused by something as
simple as the data being grabbed across the network (rsh HOST cat foo | ...).

Anyhow, I think it's bad in general to demand the reasoning of why a
selection operator should work in a pipe, it just should (although I
have presented a reasonable argument.) That's what tools are all about.

>There might be some
>use for context-{grep,diff,...} on a stream, but if a separate
>context tool will satisfy 99% of the need, as I think it would,
>as well as provide this capability for other commands "for free",
>it would be a better approach than hacking context into other
>commands.

I think claiming that 99% of the use won't need pipes is unsound, it
should just work with a pipe and any tool which requires passing the
file name and then re-positioning the file just won't, it's violating
a fundamental design concept by doing this (not that in rare cases
this might not be necessary, but I don't see where this is one of them
unless you use the circular argument of it "must be a separate
program".)

The reasoning for adding it to grep would be:

	a) Grep already has its finger on the context, it's right
	there (or could be), why re-process the entire stream/file
	just to get it printed? Grep found the context, why find it
	again?

	b) The context suggestions are merely logical generalizations
	of the what grep already does, print the context of a match
	(it just happens to now limit that to exactly one line.) Nothing
	new conceptually is being added, only generalized.

In fact, if I were to write this context-display tool my first thought
would be to just use grep and try to emit unique patterns (a la TAGS
files) which grep can then re-scan. But grep doesn't quite cut it w/o
this little generalization. I think we're going in circles and this
post-processor is nothing more than a special case of grep or perhaps
cat or sed the way it was proposed (why not just generate sed commands
to list the lines if that's all you want?)

Anyhow, at least we're back to the technical issues and away from
calling anyone who disagrees Neanderthals...

	-Barry Shein, Boston University

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>How about:
>
>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>
>or something like that?  Does that offend tool-users sensibilities?
>*Do* Neanderthals have any sensibilities?

I don't understand, the way to avoid having to tee it into temp
files is to tee it into temp files?

Given that sort of solution we can eliminate pipes entirely from unix,
was that your point? That pipes are fundamentally useless and can
always be eliminated via use of intermediate temp files?

It begs the question, burying it in a little syntactic sugar with an
alias command doesn't solve the problem.

	-Barry Shein, Boston University

andrew@alice.UUCP (06/06/88)

> Almost, unless the original input was produced by a pipeline, in which
> case this (putative) post-processor can't help unless you tee the mess
> to a temp file, yup, mess is the right word.
> Or maybe only us Neanderthals are interested in tools which work on
> pipes? Have they gone out of style?


the problem is in the limited plumbing available in the current flock of shells.
i mean, the context stuff is exactly the same issue as glob expansion;
you either build into an ad hoc set of programs or let one program do it.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/06/88)

In article <23142@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Anyhow, at least we're back to the technical issues and away from
>calling anyone who disagrees Neanderthals...

Oh, but the latter is much more fun!

Anyway, the fundamental issue seems to be that there are (at least)
two types of external data objects:
	streams -- transient data, takes special effort to capture
	files -- permanent data with an attached name
UNIX nicely makes these appear much the same, but they do have some
inherent differences, and this one-pass versus multi-pass context
discussion has brought out one of them.

There is nothing particularly wrong with the "tee" approach to
turn a stream into a file long enough for whatever work is being
done.  The converse is often done; for example many of my shell
scripts, after parsing arguments, exec a pipeline that starts
	cat $* | ...
in order to ensure a stream input to the rest of the pipeline.

garyo@masscomp.UUCP (Gary Oberbrunner) (06/06/88)

The only change I've ever had to make to the source for grep to make it do
what I want was to make it work with arbitrary-length lines.
I consider not handling long lines (and not complaining about them either)
to be extremely antisocial.  All this other stuff is just window-dressing.
Not that it's bad; one integrated grep with B-M strings, alternation and
inversion operators, and nifty feeping creaturism is great by me.

I usually handle the multi-line-record case by tr'ing all the intermediate
line ends into some unused character, doing my database hackery (grep, awk,
sed, what have you) and then tr'ing back at the end.  This is one reason for
having grep support very long lines.

				As always,

				Gary

----------------------------------------------------------------------------
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp
-- 
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp

bzs@bu-cs.BU.EDU (Barry Shein) (06/06/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>There is nothing particularly wrong with the "tee" approach to
>turn a stream into a file long enough for whatever work is being
>done.  The converse is often done; for example many of my shell
>scripts, after parsing arguments, exec a pipeline that starts
>	cat $* | ...
>in order to ensure a stream input to the rest of the pipeline.

Nothing wrong with it unless you happen to be on a parallel machine as
I am a lot of the time and pipes can run in parallel nicely.

Nyah Nyah, got ya there! PHFZZZZT! I win! I win!

You're right, this is getting ridiculous, we made our points...

Ok everyone, back to arguing which flags should be maintained in cat
and Unix Standardization AKA "West Coast Story" (snap fingers.)

	-Barry Shein, Boston University

preece%fang@xenurus.gould.com (Scott E. Preece) (06/06/88)

From: Doug Gwyn  <gwyn@brl-smoke.arpa>
> The proposed tool would be very handy on ordinary text files, but it is
> hard to see a use for it on pipes.  Or, getting back to context-grep,
> what good would it do to show context from a pipe?  To do anything with
> the information (other than stare at it), you'd need to produce it
> again.
----------
Well, actually, I often produce output for which I have no other use
than staring at.  Sometimes one uses the system to find an answer,
rather than to create grist for some future use...

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucuxc!urbsdc!preece
arpa:	preece@Gould.com

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/06/88)

In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>I don't understand, the way to avoid having to tee it into temp
>files is to tee it into temp files?

No.  There is no way to avoid teeing it into a temp file.  Such is
life with pipes.  If you want context then you need to save it.  My
alias is perfectly consistent with the tool-using philosophy.  Yes,
it's a kludge, but that's the only way to save context in a single-stream
pipe philosophy.  I remember reading a paper in which multiple streams
going hither and yon were proposed, but the syntax was gothic at best.
I like being able to say this:

bsd:	sort | with_context grep rfoo | more
sysv:	sort | with_context grep foo | more
	Because sysv doesn't have the r* utilities, of course :-)
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (06/07/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

GNU Emacs has a command that will walk you through each match of a
grep run and show you the context around it:

   grep:
   Run grep, with user-specified args, and collect output in a buffer.
   While grep runs asynchronously, you can use the C-x ` command
   to find the text that grep hits refer to.

M-x grep RET to invoke it.  I suspect other Unix Emacs have a similar
feature.

Information on how to obtain GNU Emacs, other GNU software, or the GNU
project itself is available from:

	gnu@prep.ai.mit.edu

enjoy -len

brianc@cognos.uucp (Brian Campbell) (06/07/88)

In article <4524@vdsvax.steinmetz.ge.com> Bruce G. Barnett writes:
> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

In article <1036@cfa.cfa.harvard.EDU> Bill Wyatt suggests:
> grep '(your_pattern_here)' | head -1

In article <4537@vdsvax.steinmetz.ge.com> Bruce G. Barnett replies:
> There are times when I want the first occurrence of a pattern without
> reading the entire (i.e. HUGE) file.

If we're talking about finding subject lines in news articles:
	head -20 file1 file2 ... | grep ^Subject:

> Or there are times when I want the first occurrence of a pattern from
> hundreds of files, but I don't want to see the pattern more than once.

In this case, the original suggestion seems appropriate:
	grep pattern file1 file2 ... | head -1
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200/2400, sysop@1:163/8

oz@yunexus.UUCP (Ozan Yigit) (06/08/88)

In article <7939@alice.UUCP> andrew@alice.UUCP writes:
>
>many people have written about patterns matching multiple lines.
>grep will not do this. if you really need this, use sam by rob pike
>as described in the nov 1987 software practice and experience.
>
	Why should this not be done by grep ??? I think Rob Pike's
	"Structured Expressions" is the way to go for a modern grep,
	where newline spanning is supported, and the program does
	not die unexpectedly just because a file contains a line too
	long for a stupid internal "line size". (For an insightful
	discussion of this, interested readers could check out Rob's
	paper in EUUG proceedings.)

oz
-- 
The deathstar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense sunbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

jgreely@dimetrodon.cis.ohio-state.edu (J Greely) (06/08/88)

In article <1998@u1100a.UUCP> krohn@u1100a.UUCP (Eric Krohn) writes:
>To put in a plug for Larry Wall's perl language (Release 2.0 due soon at a
>comp.sources.unix near you):

>[suggests the following script for grep-first-only]
>perl -n -e 'if(/Subject/){print $ARGV,":",$_;close(ARGV);}' * >/dev/null

This works, and is indeed faster.  However, it shares one problem with
all of the others: '*' expansion.  As an (uncomfortable) example,
/usr/spool/news/talk/bizarre has over 2500 articles in it at our site,
and the shell can't expand that properly (SunOS 3.4, if it matters).
So, the following perl script accomplishes the same thing, no matter
how many files need searched:

#!/usr/local/bin/perl
while ($File = <*>) {
  open(file,$File);
  while (<file>) {
    if (/^Subject/){
      print $File,":",$_;
      last;
    }
  }
  close(file);
}

It's about as fast as the one-liner, and more robust.
-=-
       (jgreely@cis.ohio-state.edu; ...!att!cis.ohio-state.edu!jgreely)
		  Team Wheaties says: "Just say NO to rexd!"
	       /^Newsgroups: .*\,.*\,.*\,/h:j   /[Ww]ebber/h:j
	       /[Bb]irthright [Pp]arty/j        /[Pp]ortal/h:j

guy@gorodish.Sun.COM (Guy Harris) (06/09/88)

> No, the obvious thing to do is:
> 
> nm -o _memcpy *.o

"Obvious" under which version of UNIX?  From the 4.3BSD manual:

	-o	Prepend file or archive element name to each output line
		rather than only once.

The SunOS manual page says the same thing.

From the S5R3 manual:

	-o	Print the value and size of a symbol in octal instead of
		decimal.

With the 4.3BSD version you can do

	nm -o *.o | egrep _memcpy

and get the result you want.  For any version of "nm" that I know of, you can
do the "egrep" trick mentioned in another posting; you may have to use a flag
such as "-p" with the S5 version to get "easily parsable, terse output."

john@frog.UUCP (John Woods) (06/09/88)

Hypothesize for the moment that I would like to have the Subject: lines for
each article in /usr/spool/news/comp/sources/unix.  Many people have proposed
a new flag for the "new grep" (one that functions just like the -one flag does
on "match", the matching program I use (a flag I implemented long ago)).

In article<5007@sdcsvax.UCSD.EDU>,hutch@net1.ucsd.edu(Jim Hutchison) suggests:
> grep pattern file1 ... fileN | head -1
> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.  This would result in relatively little
> of the file actually being read under most Unix implementations.

Yes, it would result in relatively little of the file being read.  It would
also result in relatively little of the desired output.  Check the problem
space before posting solutions, folks.

As I pointed out in another message, you can get awk to solve the problem
almost exactly, with some irregularity in the NFILES={0,1} cases.  However,
the "tool-using" approach is a two-edged sword, it seems to me:  a matching
problem should be solvable by using the matching tool, not by a special case
of an editor tool (the purported "sed" solution) or by having to reach for
a full-blown programming language (awk); just as one should not paginate
a text file by using the /PAGINATE /NOPRINT features of a line-printer
program...  Sometimes you need to EN-feature a program in order to avoid
having to turn to (other) inappropriate tools.  "Oh, you can't ADD text
with this editor, only change existing text.  You add text by using
'cat >> filename' ..."

I like the "context" tool suggested elsewhere, but it has one problem (as
stated) for replacing context diffs:  context diffs are both context and
_differences_, and are generally clearly marked as such (i.e., the !+-
convention); while I guess you could turn an ed-script style diff listing
into a context diff (given both input files and the diff marks), that is
a radically different input language than that proposed for eliminating
context grep.  This just means, however, that two context tools are needed,
not just one.

To paraphrase Einstein, "Programs should be as simple as possible, and no
simpler."
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

john@frog.UUCP (John Woods) (06/09/88)

In article <1998@u1100a.UUCP>, krohn@u1100a.UUCP (Eric Krohn) writes:
> In article <1112@X.UUCP> john@frog.UUCP (some clown :-) writes:
> ] 	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *
> 
> This will print Subject: lines more than once per file if a file happens to
> have more than one Subject: line.  `Next' goes to the next input line, not
> the next input file, so you are still left with an exhaustive search of all
> the files.
> 
Oops.  I blew it.  Working on GNU awk seems to have permanently damaged my
brain (there are a couple of differences between "real" awk and GNU awk which
I couldn't convince the author were worth changing, specifically in 'exit'
(not next); GNU exit actually does what I thought next would do, instead of
exiting entirely).  

-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

leo@philmds.UUCP (Leo de Wit) (06/09/88)

In article <449@happym.UUCP> kent@happym.UUCP (Kent Forschmiedt) writes:
>In article <136@rubmez.UUCP> frei@rubmez.UUCP (Matthias Frei ) writes:
>>I want following flags:
>>
>>	- d	divert the file
>>		"matches" to stdout
>>		"nomatches" to stderr
>>	-r	exchange stdout and stderr, if -d is given  
>I second the vote - just today I did one of these:
>
>grep $PATTERN file > afile
>grep -v $PATTERN file > anotherfile
>
>Note, however, that -v will serve for the suggested -r.
>>Will you post Your new grep to the net ? (I hope so)
>From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
>Unix, and none of us humans will see it until sysVr6, and only then 
>if we are lucky!! 

You are lucky, because here's your_new_grep:

---------------------- S T A R T   H E R E ---------------
#!/bin/sh
# Usage: yngrep pattern matches nomatches [file ...]

case $# in
0|1|2) echo "Usage: $0 <pattern> <matchfile> <nomatchfile> [file ...]"; exit;;
*) pattern=$1 matches=$2 nomatches=$3; shift; shift; shift;;
esac

exec sed -n -e "
/$pattern/w $matches
/$pattern/!w $nomatches" $*
---------------------- S T O P     H E R E ---------------

Use the p command of sed to write to stdout. I don't know how to write to the
stderr from within sed. Don't think exec 2>outfile beforehand works, because
sed does not open for append. But you could use w /dev/tty, that's often what
you want for stderr anyway 8-).
Hope it works right away, didn't test it.

	Leo.

jad@insyte.UUCP (Jill Diewald) (06/09/88)

A missing feature in UNIX is the ability to deal with files with
very long lines - the kind of file you get from a data base tape like
Compustat.  The standard data base tape contains very long lines,
instead of separating each record with a newline, all the records may
be on the same line.  There is a defined record size which is used to
determine when a record ends - instead of a newline.

There are two specific things that it would be nice to do with UNIX, 
instead of having to write a c program:

1) To be able to give grep (also awk, etc) a record size 
which it would use instead of newlines.  

2) To be able to specify a field range (ie columns 20-30) for the 
program to search - instead of the entire line/record.  In addition it 
should be posible to specify several fields in one grep.  For 
example: to search for records which have "1000" in one field or
"2000" in another.  Sort uses fields so they aren't totally foreign
to UNIX.

vanam@pttesac.UUCP (Marnix van Ammers) (06/10/88)

In article <4524@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes:

>There have been times when I wanted a grep that would print out the
>first occurrence and then stop.

sed -n -e "/<pattern>/ { p" -e q -e "}"