[comp.unix.questions] grep replacement

andrew@alice.UUCP (05/23/88)

	Al Aho and I are designing a replacement for grep, egrep and fgrep.
The question is what flags should it support and what kind of patterns
should it handle? (Assume the existence of flags to make it compatible
with grep, egrep and fgrep.)
	The proposed flags are the V9 flags:
-f file	pattern is (`cat file`)
-v	print nonmatching
-i	ignore aphabetic case
-n	print line number
-x	the pattern used is ^pattern$
-c	print count only
-l	print filenames only
-b	print block numbers
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-s	no output; just status
-e expr	use expr as the pattern

The patterns are as for egrep, supplemented by back-referencing
as in \{pattern\}\1.

please send your comments about flags or patterns to research!andrew

papowell@attila.uucp (Patrick Powell) (05/25/88)

In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>
>	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>The question is what flags should it support and what kind of patterns
>should it handle? (Assume the existence of flags to make it compatible
>with grep, egrep and fgrep.)
>
>please send your comments about flags or patterns to research!andrew

The one thing I miss about grep families is the ability to have
a named search pattern. For example:

DIGIT= \{[0-9]\}
ALPHA=\{[a-zA-Z]\}
\${ALPHA}\${PATTERN}

This would sort of make sense.

The other facility is to find multiple line patterns, as in:
find the pair of lines that have pattern1 in the first line
pattern2 in the second, etc.

This I have needed sooo many times;  I have ended up using AWK
and a clumsy set of searches.

For example:
\#{1 p}Pattern
\#{2}Pattern
This could print out lines that match,  or only the first line
(1p->print this one only).

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

ljz@fxgrp.UUCP (Lloyd Zusman) (05/26/88)

In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
  In article <7882@alice.UUCP> andrew@alice.UUCP writes:
  >
  >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
  >The question is what flags should it support and what kind of patterns
  >should it handle? ...

  ...

  The other facility is to find multiple line patterns, as in:
  find the pair of lines that have pattern1 in the first line
  pattern2 in the second, etc.

  This I have needed sooo many times;  I have ended up using AWK
  and a clumsy set of searches.

  For example:
  \#{1 p}Pattern
  \#{2}Pattern
  This could print out lines that match,  or only the first line
  (1p->print this one only).

  ...

Or another way to get this functionality would be for this new greplike
thing to allow matches on the newline character.  For example:

    ^.*foo\nbar.*$
          ^^
    	newline

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) (05/26/88)

One thing I would _love_ is to be able to find the context of what I've
found, for example, to find the two (n?) surrounding lines.  I have wanted
to do this many times and there is no good way.

	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
			elroy!alan@csvax.caltech.edu	   could go wrong?"

ben@idsnh.UUCP (Ben Smith) (05/26/88)

I also would like to see more of the lex capabilities in grep.
-- 
Integrated Decision Systems, Inc.    | Benjamin Smith - East Coast Tech. Office
The fitting solution in professional | Peterborough, NH
portfolio management software.       | UUCP: uunet!idsnh!ben

kutz@bgsuvax.UUCP (Kenneth Kutz) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

There is a program on the Usenix tape under .../Utilities/Telephone
called 'tele'.  If you call the program using the name 'g', it
supports displaying of context.  E-mail me if you want more info.

-- 
--------------------------------------------------------------------
      Kenneth J. Kutz         	CSNET kutz@bgsu.edu
				UUCP  ...!osu-cis!bgsuvax!kutz
 Disclaimer: Opinions expressed are my own and not of my employer's
--------------------------------------------------------------------

dcon@ihlpe.ATT.COM (452is-Connet) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon

david@elroy.Jpl.Nasa.Gov (David Robinson) (05/27/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
 
> Also, what line number it was found on.
 


How about "grep -n"?



-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

daveb@laidbak.UUCP (Dave Burton) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
|Also, what line number it was found on.

Already there: grep -n.

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

Please. Maybe "grep -k" where k is any integer giving the number of lines
of context on each side of grep, default is 0. Oh, but hey, _you're_ designing
it! :-)
-- 
--------------------"Well, it looked good when I wrote it"---------------------
 Verbal: Dave Burton                        Net: ...!ihnp4!laidbak!daveb
 V-MAIL: (312) 505-9100 x325            USSnail: 1901 N. Naper Blvd.
#include <disclaimer.h>                          Naperville, IL  60540

dcon@ihlpe.ATT.COM (452is-Connet) (05/27/88)

In article <6877@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
>> Also, what line number it was found on.
>How about "grep -n"?
>


Embarassed and red-faced he goes away to read the man-page...

stan@sdba.UUCP (Stan Brown) (05/27/88)

> 
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
> 
> 	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
> 			elroy!alan@csvax.caltech.edu	   could go wrong?"


	Along this same general line it would be nice to be abble to
	look for paterns that span lines.  But perhaps this would be
	tom complete a change in the philosophy of grep ?

	stan


-- 
Stan Brown	S. D. Brown & Associates	404-292-9497
(uunet gatech)!sdba!stan				"vi forever"

jas@rain.rtech.UUCP (Jim Shankland) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
>In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines....
>
>Also, what line number it was found on.

You've already got the line number with the "-n" option.  Note that that makes
it easy to write a little wrapper script that gives you context grep.
Whether that's preferable to adding the context option to grep is, I suppose,
debatable; but I can already see the USENIX paper:

	"newgrep -[whatever] Considered Harmful"

Jim Shankland
  ..!ihnp4!cpsc6a!\
               sun!rtech!jas
 ..!ucbvax!mtxinu!/

aperez@cvbnet2.UUCP (Arturo Perez Ext.) (05/27/88)

From article <662@fxgrp.UUCP>, by ljz@fxgrp.UUCP (Lloyd Zusman):
> In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
>   In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>   >
>   >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   >The question is what flags should it support and what kind of patterns
>   >should it handle? ...

Actually, I agree with the guy who posted a request shortly before this
came out.

The most useful feature that is currently lacking is the ability to
do context greps, i.e. greps with a window.  There are two ways this could be
handled.   One is to allow awk-like constructs specifying beginning and 
ending points for a window.  Sort of like, e.g.

	grep -w '/:/,/^$/' file

which would find the lines between each pair of a ':' containing line and
the next following blank line.  The other way would be to have a simple
"number of lines around match" parameter, possibly with collapse of overlapping
windows.  Then you could say

	grep -w 5 foo file

which would print 2 lines above and below the matching line.  Either way
it's done would be nice.  I have made one attempt to implement this
with a script and it wasn't too much fun...

Arturo Perez
ComputerVision, a division of Prime

morrell@hpsal2.HP.COM (Michael Morrell) (05/28/88)

/ hpsal2:comp.unix.questions / dcon@ihlpe.ATT.COM (452is-Connet) /  6:36 am  May 26, 1988 /

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon
----------

grep -n does this, but I'd like to see an option which ONLY prints the line
numbers where the pattern was found.

  Michael Morrell
  hpda!morrell

bzs@bu-cs.BU.EDU (Barry Shein) (05/28/88)

Re: grep with N context lines shown...

Interesting, that's very close to a concept of a multi-line record
grep where I treat N lines as one and any occurrance results in a
listing. The difference is the line chosen to count from (in a context
the match would probably be middle and +-N, in a record you'd just
list the record.)

Just wondering if a generalization is being missed here somewhere,
also consider grepping something like a termcap file, maybe what I
really want is a generalized method to supply pattern matchers for
what to list on a hit:

	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
					# beginning with white space to
					# one past current line

Of course, that destroys the stream nature of grep, it has to be able
to arbitrarily back up, ugh, although "last candidate for a start"
could be saved on the fly. The nice thing is that it can use
(essentially) the same pattern machinery for choosing printing (I
know, have to add in the notion of dot etc.)

I dunno, food for thought, like I said, maybe there's a generalization
here somewhere. Or maybe grep should just emit line numbers in a form
which could be post-processed by sed for fancier output (grep in
backquotes on sed line.) Therefore none of this is necessary :-)

	-Barry Shein, Boston University

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/28/88)

[mail bounced]

There have been times when I wanted a grep that would print out the
first occurrence and then stop.

-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

wyatt@cfa.harvard.EDU (Bill Wyatt) (05/28/88)

> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

grep '(your_pattern_here)' | head -1
-- 

Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
Wyatt   ARPA:  wyatt@cfa.harvard.edu
         (or)  wyatt%cfa@harvard.harvard.edu
      BITNET:  wyatt@cfa2
        SPAN:  cfairt::wyatt

ado@elsie.UUCP (Arthur David Olson) (05/28/88)

> > There have been times when I wanted a grep that would print out the
> > first occurrence and then stop.
> 
> grep '(your_pattern_here)' | head -1

Doesn't cut it for

	grep '(your_pattern_here)' firstfile secondfile thirdfile ...
-- 
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

roy@phri.UUCP (Roy Smith) (05/28/88)

wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[as a way to get just the first occurance of pattern]
> grep '(your_pattern_here)' | head -1

	Yes, it'll certainly work, but I think it bypasses the original
intention; to save CPU time.  If I had a 1000 line file with pattern on
line 7, I want grep to read the first 7 lines, print out line 7, and exit.
grep|head, on the other hand, will read and search all 1000 lines of the
file; it won't exit (with a EPIPE) until it writes another line to stdout
and finds that head has already exited.  In fact, if grep block-buffers its
output, it may never do more than a single write(2) and never notice that
head has exited.

	Anyway, I agree with the "find first match" flag being a good idea.
It would certainly speed up things like

	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

where I know that the pattern is going to be matched in the first few lines
and don't want to bother searching the rest of the multi-killoline file.
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

chip@vector.UUCP (Chip Rosenthal) (05/29/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> grep '(your_pattern_here)' | head -1
>Doesn't cut it for
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

nor if you want to see if a match was found by testing the exit status
-- 
Chip Rosenthal /// chip@vector.UUCP /// Dallas Semiconductor /// 214-450-0400
{uunet!warble,sun!texsun!rpp386,killer}!vector!chip
I won't sing for politicians.  Ain't singing for Spuds.  This note's for you.

guy@gorodish.Sun.COM (Guy Harris) (05/29/88)

> grep -n does this, but I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.

I wouldn't - if you're only grepping one file, you can do it without such an
option:

	grep -n <pattern> <file> | sed -n 's/\([0-9]*\):.*/\1/p'

If you're grepping more than one file, you obviously have to decide what you
want to do with the file name and the line number; once you do, just change the
"sed" pattern appropriately (and note that if the list of files is variable,
you either have to stick "/dev/null" in there to make sure the names are
generated even if there's only one file or have the script distinguish between
the one-file and >1-file cases; I seem to remember some indication that the new
BTL research "grep" would have a flag to tell it always to give the file name).

russ@groucho.ucar.edu (Russ Rew) (05/30/88)

I also recently had a need for printing multi-line "records" in which a
specified pattern appeared somewhere in the record.  The following
short csh script uses the awk capability to treat whole lines as fields
and empty lines as record separators to print all the records from
standard input that contain a line matching a regular specified as an
argument:

#!/bin/csh -f
awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '


     Russ Rew * UCAR (University Corp. for Atmospheric Research)
	 PO Box 3000 * Boulder, CO  80307-3000 * 303-497-8845
	     russ@unidata.ucar.edu * ...!hao!unidata!russ

joey@tessi.UUCP (Joe Pruett) (05/31/88)

>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

This works, but is quite slow if the input to grep is large.  A hack
I've made to egrep is a switch of the form -<number>.  This causes only
the first <number> matches to be printed, and then the next file is
searched.  This is great for:

egrep -1 ^Subject *

in a news directory to get a list of Subject lines.

jqj@uoregon.uoregon.edu (JQ Johnson) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>grep '(your_pattern_here)' | head -1
This is, of course, unacceptable if you are searching a very long file
(say, a census database) and have LOTS of pipe buffering.

Too bad it isn't feasible to have a shell that can optimize pipelines.

dan@maccs.UUCP (Dan Trottier) (05/31/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> > There have been times when I wanted a grep that would print out the
>> > first occurrence and then stop.
>> 
>> grep '(your_pattern_here)' | head -1
>
>Doesn't cut it for
>
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

This is getting ridiculous and can be taken to just about any level...

	foreach i (file1 file2 ...)
	   grep 'pattern' $i | head -1
	end

-- 
       A.I. - is a three toed sloth!        | ...!uunet!mnetor!maccs!dan
-- Official scrabble players dictionary --  | dan@mcmaster.BITNET

leo@philmds.UUCP (Leo de Wit) (05/31/88)

In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes:
>I also recently had a need for printing multi-line "records" in which a
>specified pattern appeared somewhere in the record.  The following
>short csh script uses the awk capability to treat whole lines as fields
>and empty lines as record separators to print all the records from
>standard input that contain a line matching a regular specified as an
>argument:
>
>#!/bin/csh -f
>awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '
>
>

Awk is a nice solution, but sed is a much faster one. I've been following 
the 'grep' discussion for some time now, and have seen much demand for
features that are simply within sed. Here are some; I have left the discussion
about the function of this or that sed-command out: there is a sed article and
a man page...

Patrick Powell writes:
>The other facility is to find multiple line patterns, as in:
>find the pair of lines that have pattern1 in the first line
>pattern2 in the second, etc.

Try this one:

        sed -n -e '/PATTERN1/,/PATTERN2/p' file

It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can
have subcommands to do special things (with '{' I mean).

Alan (..!cit-vax!elroy!alan) writes:
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

There is. Try this one:

        sed -n -e '
/PATTERN/{
x
p
x
p
n
p
}
h' file

It prints the line before, the line containing the PATTERN, and the line after.
Of course you can make the output fancier and the number of lines printed
larger.

David Connet writes:
>>
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines.  I have wanted
>>to do this many times and there is no good way.
>Also, what line number it was found on.

Sed can also handle this one:

        sed -n -e '/PATTERN/=' file

Lloyd Zusman writes:
>Or another way to get this functionality would be for this new greplike
>thing to allow matches on the newline character.  For example:
>    ^.*foo\nbar.*$
>          ^^
>    	newline

Sed can match on embedded newline characters in the substitute command 
(it is indeed \n here!). The trailing newline is matched by $.

Barry Shein writes [story about relative addressing]:
>I dunno, food for thought, like I said, maybe there's a generalization
>here somewhere. Or maybe grep should just emit line numbers in a form
>which could be post-processed by sed for fancier output (grep in
>backquotes on sed line.) Therefore none of this is necessary :-)

Quite right. I think most times you want to see the context it is in 
interactive use. In that case you can write a simple sed-script that does
just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines
, where N is a constant. The example I gave for N == 1 can be extended for
larger N, with fancy output etc.

Bill Wyatt writes: 
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

Much simpler, and faster:

        sed -n -e '/PATTERN/{
p
q
}' file

Sed quits immediately after finding the first match. You could even create an 
alias for something like that.

Michael Morrell writes:
>>Also, what line number it was found on.
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

The sed trick does this:

        sed -n -e '/PATTERN/=' file

Or you could even:

        sed -n -e '/PATTERN/{
=
q
}' file

which prints the first matched line number and exits.

Roy Smith writes:
>wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>[as a way to get just the first occurance of pattern]
>> grep '(your_pattern_here)' | head -1
>	Yes, it'll certainly work, but I think it bypasses the original
>intention; to save CPU time.  If I had a 1000 line file with pattern on
>line 7, I want grep to read the first 7 lines, print out line 7, and exit.
>grep|head, on the other hand, will read and search all 1000 lines of the
>file; it won't exit (with a EPIPE) until it writes another line to stdout
>and finds that head has already exited.  In fact, if grep block-buffers its
>output, it may never do more than a single write(2) and never notice that
>head has exited.

Quite right. The sed-solution I mentioned before is fast and neat. In fact, 
who needs head:

        sed 10q

does the job, as you can find in a book of Kernigan and Pike, I thought the 
title was 'the Unix Programming Environment'.

Stan Brown writes:
>	Along this same general line it would be nice to be abble to
>	look for paterns that span lines.  But perhaps this would be
>	tom complete a change in the philosophy of grep ?

As I mentioned before, embedded newlines can be matched by sed in the
substitute command.

What I also see often is things like

        grep 'pattern' file | sed 'expression'

A pity a lot of people don't know that sed can do the pattern matching itself.

S. E. D. (Sic Erat Demonstrandum)

As far as options for a new grep are conceirned, I suggest to use the options
proposed (and no more). Let other tools handle other problems - that's in the
Un*x spirit. What I would appreciate most in a new grep is:
no more grep, egrep, fgrep, just one tool that can be both fast (for fixed
strings) and elaborate (for pattern matching like egrep). The 'bm' tool that
was on the net (author Peter Bain) is very fast for fixed strings, using the
Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...?

        Leo.

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
|
|> There have been times when I wanted a grep that would print out the
|> first occurrence and then stop.
|
|grep '(your_pattern_here)' | head -1

Yes I have tried that. You are missing the point.

Have you ever waited for a computer?  

There are times when I want the first occurrence of a pattern without
reading the entire (i.e. HUGE) file.

Or there are times when I want the first occurrence of a pattern from
hundreds of files, but I don't want to see the pattern more than once.

And yes I know how to write a shell script that does this.

IMHO (sarcasm mode on), it is more efficient to call grep 
once for one hundred files, than to call (grep $* /dev/null|head -1) 
one hundred times. 
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

gwc@root.co.uk (Geoff Clare) (05/31/88)

Most of the useful things people have been saying they would like to be
able to do with 'grep' can already be done very simply with 'sed'.
For example:

    Stop after first match:   sed -n '/pattern/{p;q;}'

    Match over two lines:     sed -n 'N;/pat1\npat2/p;D'

It should also be possible to get a small number of context lines by
judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
tried it.  Anyway, this can be done with a normal line editor (if the data
to be searched aren't coming from a pipe) with 'g/pattern/-,+p'.

I was rather alarmed to see the proposal for 'pattern repeat' in the original
article was '\{pattern\}\1' rather than '\(pattern\)\1', as the latter is
already used for this purpose in the standard editors (ed, ex/vi, sed).
Or was it a typo?

By the way, does anyone know why the ';' command terminator in 'sed' is
not documented?  It works on all the systems I've tried it on, but I
have never found it in any manuals.  It's so much nicer than putting
the commands on separate lines, or using multiple '-e' options.
-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-606-7799  FAX: +44-1-726-2750

andyc@omepd (T. Andrew Crump) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:

>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.

   >
   >grep '(your_pattern_here)' | head -1

Yes, but it forces grep to search a whole file, when what you may have wanted
was at the beginning.  This is inefficient if the "file" is large.

A more general version of this request would be a parameter that would restrict
grep to n or less occurrences, maybe 'grep -N #'.

-- Andy Crump

trb@ima.ISC.COM (Andrew Tannenbaum) (05/31/88)

> I seem to remember some indication that the new BTL research "grep"
> would have a flag to tell it always to give the file name).

I have always wanted to be able to tell grep to NOT print the file
names on a multi-file grep.  Let's say I want a phone number script -
usually a simple grep - but if I want to store the numbers in multiple
files (e.g. mine and my departments), then the output contains
unsightly filenames.  This has always struck me as opposite to the UNIX
philosophy of having a filter provide output that is useful as data.

I would like the option to go to the next file after first match
(regardless of which other options are present).  Also, I would like to
print a region other than line on a match.  It would be nice to delimit
the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others)
would be useful.

	Andrew Tannenbaum   Interactive   Boston, MA   +1 617 247 1155

avr@mtgzz.UUCP (XMRP50000[jcm]-a.v.reed) (05/31/88)

In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes:
> Also, what line number it was found on.
> grep -n does this, but I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.

# In ksh's $ENV - otherwwise use a shell script:
function lngrep {
		if [ "$#" = '1' ]
			then grep -n $@ | cut -f1 -d:
			else grep -n $@ | cut -f1,2 -d:
		fi
		}
					Adam Reed (mtgzz!avr)

glennr@holin.ATT.COM (Glenn Robitaille) (06/01/88)

> > > There have been times when I wanted a grep that would print out the
> > > first occurrence and then stop.
> > 
> > grep '(your_pattern_here)' | head -1
> 
> Doesn't cut it for
> 
> 	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

Well, if you have a shell command like

	#
	# save the search patern
	#
	patern=$1
	#
	# remove search patern from $*
	#
	shift
	for i in $*
	do
		#
		# grep for search patern
		#
		line=`grep ${patern} ${i}|head -1`
		#
		# if found, print file name and string
		#
		test -n "$line" && echo "${i}:\t${line}"
	done

It'll work fine.  If you want to use other options, have them in
quotes as part of the first argument.


Glenn Robitaille
AT&T, HO 2J-207
ihnp4!holin!glennr
Phone (201) 949-7811

aeb@cwi.nl (Andries Brouwer) (06/01/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

A fast way of searching for the first occurrence is really useful.
I have a version of grep called `contains', and a shell script
for formatting that says: if the input contains .[ then use refer;
if it contains .IS then ideal; if .PS then pic; if .TS then tbl, etc.

-- 
      Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl

jin@hplabsz.HPL.HP.COM (Tai Jin) (06/01/88)

In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes:
>I have always wanted to be able to tell grep to NOT print the file
>names on a multi-file grep.  Let's say I want a phone number script -
>usually a simple grep - but if I want to store the numbers in multiple
>files (e.g. mine and my departments), then the output contains
>unsightly filenames.  This has always struck me as opposite to the UNIX
>philosophy of having a filter provide output that is useful as data.

Actually, I think the Unix philosophy is to have simple filters and use
pipes to construct more complex filters.  Unfortunately, you can't do
everything with pipes.

>I would like the option to go to the next file after first match
>(regardless of which other options are present).  Also, I would like to
>print a region other than line on a match.  It would be nice to delimit
>the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others)
>would be useful.

I also would like a context grep that greps for records with arbitrary
delimiters.  I started working on this, but I've had no time to finish it.

...tai

hasch@gypsy.siemens-rtl (Harald Schaefer) (06/01/88)

If you are only interested in the first occurence of a pattern, you can use
something like
	sed -n '/<pattern>/ {
		p
		q
		}' file
Harald Schaefer
Siemens Corp. - RTL
Bus. Phone (609) 734 3389
Home Phone (609) 275 1356

uucp:	...!princeton!gypsy!hasch
	hasch@gypsy.uucp
ARPA:	hasch@siemems.com
	hasch%siemens@princeton.EDU

aburt@isis.UUCP (Andrew Burt) (06/01/88)

I'd like to see the following enhancements in a grepper:

	-  \< and \> to match word start/end as in vi, with -w option
		as in BSD grep to match pattern as a word.

	- \w in pattern to match whitespace (generalization: define
		\unused-letter as a pattern; or allow full lex capability).

	- way to invert piece of pattern such as: grep foo.*\^bar\^xyzzy
		with meaning as in: grep foo | grep -v bar | grep -v xyzzy
		(or could be written grep foo.*\^(bar|xyzzy) of course).

	-  Select Nth occurrence of match (generalization: list of
		matches to show: grep -N -2,5-7,10- ... to grab up to the 2nd,
		5th through 7th, and from the 10th onward).

	- option to show lines between matches (not just matching lines)
		as in: grep -from foo -to bar ... meaning akin to
		sed/ed's /foo/,/bar/p.  (But much more useful with other
		extensions).

	- Allow matching newlines in a "binary" (or non-text) sort of mode:
		grep -B 'foo.*bar'  finds foo...bar even if they are
		not on the same line.  (But printing the "line" that
		matches wouldn't be useful anymore, so just printing the
		matched text would be better.  Someone wanting lines could
		look for \n[^\n]*foo.*bar[^\n]*\n, though a syntax to
		make this easier might be in order.  Perhaps this wouldn't
		be an example of a binary case -- but a new character
		with meaning like '.' but matching ANY character would work:
		if @ is such a character then "grep foo@*bar".   Perhaps
		a better example, assuming the \^ for inversion syntax
		above would be "grep foo@*(\^bar)bar -- otherwise it would
		match from first foo to last bar, while I might want from
		first foo to first bar.)

	- provide byte offset of start of match (like block number or
		line number) useful for searching non-text files.

	- Provide a lib func that has the RE code in it.

	- Install RE code in other programs: awk/sed/ed/vi etc.
		Oh for a standardized RE algorithm!
-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

jjg@linus.UUCP (Jeff Glass) (06/01/88)

In article <470@q7.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
> >grep '(your_pattern_here)' | head -1
> 
> This works, but is quite slow if the input to grep is large.  A hack
> I've made to egrep is a switch of the form -<number>.  This causes only
> the first <number> matches to be printed, and then the next file is
> searched.  This is great for:
> 
> egrep -1 ^Subject *
> 
> in a news directory to get a list of Subject lines.

Try:

	sed -n -e '/pattern/{' -e p -e q -e '}' filename

This prints the first occurrence of the pattern and then stops searching
the file.  The generalizations for printing the first <n> matches and
searching <m> files (where n,m > 1) are more awkward (no pun intended)
but are possible.

/jeff

brianm@sco.COM (Brian Moffet) (06/01/88)

In article <4537@vdsvax.steinmetz.ge.com> barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|grep '(your_pattern_here)' | head -1
>
>Or there are times when I want the first occurrence of a pattern from
>hundreds of files, but I don't want to see the pattern more than once.
>

Have you tried sed?  How about 

$ sed -n '/pattern/p;/pattern/q' file

???



-- 
Brian Moffet		brianm@sco.com  {uunet,decvax!microsof}!sco!brianm
The opinions expressed are not quite clear and have no relation to my employer.
'Evil Geniuses for a Better Tommorrow!'

anw@nott-cs.UUCP (06/01/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer)
writes:
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

	See below.  Does n == 4, but easily changed.

In article <590@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.

	Which is not to say that they shouldn't also be in "*grep"!

>	[ good examples omitted ]
>
> It should also be possible to get a small number of context lines by
> judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
> tried it.  [ ... ]

	The following is "/usr/bin/kwic" on this machine (PDP 11/44 running
V7).  I wrote it about three years ago in response to a challenge from some
AWK zealots;  it runs *much* faster than the equivalent AWK script.  That
is, it is sloooww rather than ssllloooooowwww.  I have a manual entry for
it which is too trivial to send.  Bourne shell, of course.  Use at whim
and discretion.  Several minor bugs, mainly (I hope!) limitations of or
between "sh" and "sed".  (Note that the various occurrences of multiple
spaces in "s..." commands are all TABs, in case mailers/editors/typists
have mangled things.)

> By the way, does anyone know why the ';' command terminator in 'sed' is
> not documented?  It works on all the systems I've tried it on, but I
> have never found it in any manuals.  It's so much nicer than putting
> the commands on separate lines, or using multiple '-e' options.

	No, I don't know why, but it isn't the only example in Unix of a
facility most easily discovered by looking in the source.  I've occasionally
used it, but I tried re-writing the following that way, and it *didn't* look
so much nicer;  in fact it looked 'orrible.

--------------------------------- [cut here] -----------------------------
[ $# -eq 0 ] && { echo "Usage: $0 pattern [file] ..." 1>&2; exit 1; }

l='[^\n]*\n' pat="$1" shift

exec sed -n   "/$pat"'/ b found
			s/^/	/
			H
			g
      /^'"$l$l$l$l$l"'/ s/\n[^\n]*//
			h
			b
	: found
			s/^/++	/
			H
			g
			s/.//p
			s/.*//
			h
	: loop
		      $ b out
			n
	     /'"$pat"'/ b found
			s/^/	/
			H
			g
	/^'"$l$l$l$l"'/ !b loop
	: out
			s/.//p
			s/.*/-----------------/
			h
	    ' ${1+"$@"}

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

andrew@alice.UUCP (06/01/88)

in my naivity, i had not been following netnews closely
after i posted the original ``grep replacement'' article.
I assumed that people would reply to me, not the net.
That is the reason i have not been participating in the discussion.
i will be posting my resolution of the suggestions shortly.

many people have written about patterns matching multiple lines.
grep will not do this. if you really need this, use sam by rob pike
as described in the nov 1987 software practice and experience.
the code is available for a plausible fee from the at&t toolchest.

sef@csun.UUCP (Sean Fagan) (06/02/88)

Something I'd like to see is this: grep '^<somepattern>$^<morepatterns>$...'.
While this would, of course, not be trivial, I think it would probably be
more general (and therefore more in the "spirit" of Unix(tm)) than showing n
lines around a matched pattern.
But that's just my opinion.

-- 
Sean Fagan  (818) 885-2790   uucp:   {ihnp4,hplabs,psivax}!csun!sef
CSUN Computer Center         BITNET: 1GTLSEF@CALSTATE
Northridge, CA 91330         DOMAIN: sef@CSUN.EDU
"I just build fast machines."  -- S. Cray

jfh@rpp386.UUCP (John F. Haugh II) (06/02/88)

In article <2117@uoregon.uoregon.edu> jqj@drizzle.UUCP (JQ Johnson) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>>> There have been times when I wanted a grep that would print out the
>>> first occurrence and then stop.
>>grep '(your_pattern_here)' | head -1
>This is, of course, unacceptable if you are searching a very long file
>(say, a census database) and have LOTS of pipe buffering.
>
>Too bad it isn't feasible to have a shell that can optimize pipelines.

there is a boyer/moore based fast grep in the archives.  adding an
additional option (say '-f' for first in each file?) should be quite
simple.

perhaps i'll post the diff's if i remember to go hack on the sucker
any time soon.

- joh.
-- 
John F. Haugh II                 | "If you aren't part of the solution,
River Parishes Programming       |  you are part of the precipitate."
UUCP:   ihnp4!killer!rpp386!jfh  | 		-- long since forgot who
DOMAIN: jfh@rpp386.uucp          |

john@frog.UUCP (John Woods) (06/02/88)

In article <590@root44.co.uk>, gwc@root.co.uk (Geoff Clare) writes:
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.
> For example:
>     Stop after first match:   sed -n '/pattern/{p;q;}'

Close, but no cigar.  It does not work for multiple input files.
(And, of course, spawning off a new sed for each file defeats the basic desire
of most of the people who've asked for it:  speed)

However,

	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *

does (just about) work.  And it's probably not _obscenely_ slow.
(it doesn't behave for no input files, and you might prefer no FILENAME: for
just a single input file)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

les@chinet.UUCP (Leslie Mikesell) (06/02/88)

In article <2018@hplabsz.HPL.HP.COM> jin@hplabsz.UUCP (Tai Jin) writes:
>>I have always wanted to be able to tell grep to NOT print the file
>>names on a multi-file grep. 
>
>Actually, I think the Unix philosophy is to have simple filters and use
>pipes to construct more complex filters.  Unfortunately, you can't do
>everything with pipes.



In this case it can be done with pipes:

  cat file.. |grep pattern


 Les Mikesell

mdorion@cmtl01.UUCP (Mario Dorion) (06/03/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
> >
> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
> 
> Also, what line number it was found on.
> 
> David Connet
> ihnp4!ihlpe!dcon

Ever tried grep -n ?????

There are three features I would like to see in a grep-like program:

1- Be able to use a newline character in the regular expression
       grep 'this\nthat' file 

2- Be able to grep more than one regular expression with one call. This would
   be faster than issuing many calls since the file would be read only once.

3- To have an option to search only for the first occurence of the pattern.
   Sometimes you KNOW that the pattern is there only once (for example if you
   grep '^Subject:' on news files) and there's just no need to scan the rest of
   the file. When 'grepping' into many files it would return the first occurence
   for each file.

-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

andrew@alice.UUCP (06/03/88)

In article <449@happym.UUCP>, kent@happym.UUCP writes:
> From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
> Unix, and none of us humans will see it until sysVr6, and only then 
> if we are lucky!! 


Context:
	the right thing to do is to write a context program that takes
input looking like "filename:linenumber:goo" and prints whatever context you like.
we can then take this crap out of grep and diff and make it generally available
for use with programs like the C compiler and eqn and so on. It can also do
the right thing with folding together nearby lines. At least one good first
cut has been put on the net but a C program sounds easy enough to do.

Source:
	the software i write is publicly available because it matters to me.
it was a hassle but mk and fio are available to everybody for reasonable cost
(< $125 commercial, nearly free educational). i am trying hard to do the
same for the new grep. it will be in V10, it will be in plan9, and should be
in SVR4 (the joint sun-at&t release).

lyndon@ncc.Nexus.CA (Lyndon Nerenberg) (06/04/88)

In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes:
>I have always wanted to be able to tell grep to NOT print the file
>names on a multi-file grep.

That's easy :-)

Just pipe it through cut(1). Works great unless you have a ':' as part
of the file name...
-- 
{alberta,utzoo,uunet}!ncc!lyndon  lyndon@Nexus.CA

mdorion@cmtl01.UUCP (Mario Dorion) (06/04/88)

In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes:
> 
> (...) I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.
> 
>   Michael Morrell
>   hpda!morrell

You could use the following:

grep -n 'foo' bar | cut -d: -f1


-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

allbery@ncoast.UUCP (Brandon S. Allbery) (06/05/88)

As quoted from <2312@bgsuvax.UUCP> by kutz@bgsuvax.UUCP (Kenneth Kutz):
+---------------
| In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
| > One thing I would _love_ is to be able to find the context of what I've
| > found, for example, to find the two (n?) surrounding lines.  I have wanted
| > to do this many times and there is no good way.
+---------------

	grep -n foo ./bar | context 2

I posted context to net.sources back when it existed; someone may still have
archives from that time, if not I'll retrieve my sources and repost it.  It
takes lines of the basic form

	filename ... linenumber : ...

and displays context around the specified lines.  I use this with grep quite
often; it also works with cc (pcc, not Xenix cc) error messages.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

gwyn@brl-smoke.UUCP (06/05/88)

In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>	the right thing to do is to write a context program that takes
>input looking like "filename:linenumber:goo" and prints whatever context ...

Heavens -- a tool user.  I thought that only Neanderthals were still alive.
I guess Bell Labs escaped the plague.

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|
>|> There have been times when I wanted a grep that would print out the
>|> first occurrence and then stop.
>|
>|grep '(your_pattern_here)' | head -1
>
[...]
>
>Have you ever waited for a computer?  

No, never. :-)

>There are times when I want the first occurrence of a pattern without
>reading the entire (i.e. HUGE) file.

I realize this is dependent on the way in which processes sharing a
pipe act, but this is a point worth considering before we get yet
another annoying burst of "cat -v" type programs.

grep pattern file1 ... fileN | head -1

This should send grep a SIGPIPE as soon as the first line of output
trickles through the pipe.  This would result in relatively little
of the file actually being read under most Unix implementations.
I would agree that it is a bad thing to rely on the granularity of
a pipe.  Here is a sample program which can be used to show you what
I mean.

Name it grep, and use it thus wise:

% ./grep pattern * | head -1

/* ------------- Cut here --------------- */
#include <stdio.h>
#include <signal.h>

sighandler(sig)
    int sig;
{
    if (sig == SIGPIPE)
	fprintf(stderr,"Died from a SIGPIPE\n");
    else
	fprintf(stderr,"Died from signal #%d\n", sig);
    exit(0);
}

main()
{
    signal(SIGPIPE,sighandler);
    for (;;)
	printf("pattern\n");
}
/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

I can think of a few nasty ways to do this one, I am hoping to get
a better answer.

A grep with a window of context around it.  A few lines proceeding and
following the pattern I am looking for.  The VMS search command sported
this as an option/qualifier.  I miss it sometimes (not VMS, just a few
of the more wacky utilities, like the editor option for creation of
multi-key data base files :-).

/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

tbray@watsol.waterloo.edu (Tim Bray) (06/05/88)

Grep should, where reasonable, not be bound by the notion of a 'line'.
As a concrete expression of this, the useful grep -l (prints the names of
the files that contain the string) should work on any kind of file.  More
than one existing 'grep -l' will fail, for example, to tell you which of a 
bunch of .o files contain a given string.  Scenario - you're trying to
link 55 .o's together to build a program you don't know that well.  You're
on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
The source is scattered inconveniently.  The obvious thing to do is: 
grep -l _memcpy *.o
That this often will not work is irritating.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Heavens -- a tool user.  I thought that only Neanderthals were still alive.
>I guess Bell Labs escaped the plague.

Almost, unless the original input was produced by a pipeline, in which
case this (putative) post-processor can't help unless you tee the mess
to a temp file, yup, mess is the right word.

Or maybe only us Neanderthals are interested in tools which work on
pipes? Have they gone out of style?

	-Barry "Ulak of Org" Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

The proposed tool would be very handy on ordinary text files,
but it is hard to see a use for it on pipes.  Or, getting back
to context-grep, what good would it do to show context from a
pipe?  To do anything with the information (other than stare
at it), you'd need to produce it again.  There might be some
use for context-{grep,diff,...} on a stream, but if a separate
context tool will satisfy 99% of the need, as I think it would,
as well as provide this capability for other commands "for free",
it would be a better approach than hacking context into other
commands.

By the way, I hope the new grep when asked to always produce
the filename will use "-" for stdin's name, and the context
tool would also follow the same convention.  Even though the
Research systems have /dev/stdin, other sites may not, and
anyway (as we've just seen) stdin isn't really a definite
object.

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

How about:

alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

or something like that?  Does that offend tool-users sensibilities?
*Do* Neanderthals have any sensibilities?
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>The proposed tool would be very handy on ordinary text files,
>but it is hard to see a use for it on pipes.  Or, getting back
>to context-grep, what good would it do to show context from a
>pipe?  To do anything with the information (other than stare
>at it), you'd need to produce it again.

What else are context displays for except to stare at (or save in a
file for later staring)?

Are the resultant contexts often the input to other programs? (I know
that 'patch' can take a context input but that's irrelevant, it hardly
needs nor prefers a context diff to my knowledge, it's just being
accomodating so humans can look at the context diff if something
botches.)

Actually, I can answer that in the context of the original suggestion.

The motivation for a context comes in two major flavors:

	A) To stare at (the surrounding context gives a human some
	hint of the context in which the text appeared)

	B) Because the context really represents a multi-line (eg)
	record, such as pulling out every termcap or terminfo entry
	which contains some property but desiring the result to contain
	the entire multiline entry so it could be re-used to create a
	new file.

In either case it's independent of whether the data is coming from a
pipe (as it should be.) Its pipeness may be caused by something as
simple as the data being grabbed across the network (rsh HOST cat foo | ...).

Anyhow, I think it's bad in general to demand the reasoning of why a
selection operator should work in a pipe, it just should (although I
have presented a reasonable argument.) That's what tools are all about.

>There might be some
>use for context-{grep,diff,...} on a stream, but if a separate
>context tool will satisfy 99% of the need, as I think it would,
>as well as provide this capability for other commands "for free",
>it would be a better approach than hacking context into other
>commands.

I think claiming that 99% of the use won't need pipes is unsound, it
should just work with a pipe and any tool which requires passing the
file name and then re-positioning the file just won't, it's violating
a fundamental design concept by doing this (not that in rare cases
this might not be necessary, but I don't see where this is one of them
unless you use the circular argument of it "must be a separate
program".)

The reasoning for adding it to grep would be:

	a) Grep already has its finger on the context, it's right
	there (or could be), why re-process the entire stream/file
	just to get it printed? Grep found the context, why find it
	again?

	b) The context suggestions are merely logical generalizations
	of the what grep already does, print the context of a match
	(it just happens to now limit that to exactly one line.) Nothing
	new conceptually is being added, only generalized.

In fact, if I were to write this context-display tool my first thought
would be to just use grep and try to emit unique patterns (a la TAGS
files) which grep can then re-scan. But grep doesn't quite cut it w/o
this little generalization. I think we're going in circles and this
post-processor is nothing more than a special case of grep or perhaps
cat or sed the way it was proposed (why not just generate sed commands
to list the lines if that's all you want?)

Anyhow, at least we're back to the technical issues and away from
calling anyone who disagrees Neanderthals...

	-Barry Shein, Boston University

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>How about:
>
>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>
>or something like that?  Does that offend tool-users sensibilities?
>*Do* Neanderthals have any sensibilities?

I don't understand, the way to avoid having to tee it into temp
files is to tee it into temp files?

Given that sort of solution we can eliminate pipes entirely from unix,
was that your point? That pipes are fundamentally useless and can
always be eliminated via use of intermediate temp files?

It begs the question, burying it in a little syntactic sugar with an
alias command doesn't solve the problem.

	-Barry Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/06/88)

In article <23142@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Anyhow, at least we're back to the technical issues and away from
>calling anyone who disagrees Neanderthals...

Oh, but the latter is much more fun!

Anyway, the fundamental issue seems to be that there are (at least)
two types of external data objects:
	streams -- transient data, takes special effort to capture
	files -- permanent data with an attached name
UNIX nicely makes these appear much the same, but they do have some
inherent differences, and this one-pass versus multi-pass context
discussion has brought out one of them.

There is nothing particularly wrong with the "tee" approach to
turn a stream into a file long enough for whatever work is being
done.  The converse is often done; for example many of my shell
scripts, after parsing arguments, exec a pipeline that starts
	cat $* | ...
in order to ensure a stream input to the rest of the pipeline.

garyo@masscomp.UUCP (Gary Oberbrunner) (06/06/88)

The only change I've ever had to make to the source for grep to make it do
what I want was to make it work with arbitrary-length lines.
I consider not handling long lines (and not complaining about them either)
to be extremely antisocial.  All this other stuff is just window-dressing.
Not that it's bad; one integrated grep with B-M strings, alternation and
inversion operators, and nifty feeping creaturism is great by me.

I usually handle the multi-line-record case by tr'ing all the intermediate
line ends into some unused character, doing my database hackery (grep, awk,
sed, what have you) and then tr'ing back at the end.  This is one reason for
having grep support very long lines.

				As always,

				Gary

----------------------------------------------------------------------------
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp
-- 
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp

bzs@bu-cs.BU.EDU (Barry Shein) (06/06/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>There is nothing particularly wrong with the "tee" approach to
>turn a stream into a file long enough for whatever work is being
>done.  The converse is often done; for example many of my shell
>scripts, after parsing arguments, exec a pipeline that starts
>	cat $* | ...
>in order to ensure a stream input to the rest of the pipeline.

Nothing wrong with it unless you happen to be on a parallel machine as
I am a lot of the time and pipes can run in parallel nicely.

Nyah Nyah, got ya there! PHFZZZZT! I win! I win!

You're right, this is getting ridiculous, we made our points...

Ok everyone, back to arguing which flags should be maintained in cat
and Unix Standardization AKA "West Coast Story" (snap fingers.)

	-Barry Shein, Boston University

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/06/88)

In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>I don't understand, the way to avoid having to tee it into temp
>files is to tee it into temp files?

No.  There is no way to avoid teeing it into a temp file.  Such is
life with pipes.  If you want context then you need to save it.  My
alias is perfectly consistent with the tool-using philosophy.  Yes,
it's a kludge, but that's the only way to save context in a single-stream
pipe philosophy.  I remember reading a paper in which multiple streams
going hither and yon were proposed, but the syntax was gothic at best.
I like being able to say this:

bsd:	sort | with_context grep rfoo | more
sysv:	sort | with_context grep foo | more
	Because sysv doesn't have the r* utilities, of course :-)
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

rick@seismo.CSS.GOV (Rick Adams) (06/07/88)

7th Edition grep had a -h flag to not print the filenames on a grep.

4BSD still has a -h flag.

System 5 doesn't have a -h flag.

(Another example of how System 5 is superior to BSD... and V7...)

---rick

tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (06/07/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

GNU Emacs has a command that will walk you through each match of a
grep run and show you the context around it:

   grep:
   Run grep, with user-specified args, and collect output in a buffer.
   While grep runs asynchronously, you can use the C-x ` command
   to find the text that grep hits refer to.

M-x grep RET to invoke it.  I suspect other Unix Emacs have a similar
feature.

Information on how to obtain GNU Emacs, other GNU software, or the GNU
project itself is available from:

	gnu@prep.ai.mit.edu

enjoy -len

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/07/88)

In article <44366@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>7th Edition grep had a -h flag to not print the filenames on a grep.
>4BSD still has a -h flag.
>System 5 doesn't have a -h flag.
>(Another example of how System 5 is superior to BSD... and V7...)

Maybe the AT&T folks figured that their customers were smart enough
to type "cat files ... | grep".  I've never had the need for a -h
flag, but I sure would like for the -H (ALWAYS print filename)
option to be the default instead of the current variable algorithm.

brianc@cognos.uucp (Brian Campbell) (06/07/88)

In article <4524@vdsvax.steinmetz.ge.com> Bruce G. Barnett writes:
> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

In article <1036@cfa.cfa.harvard.EDU> Bill Wyatt suggests:
> grep '(your_pattern_here)' | head -1

In article <4537@vdsvax.steinmetz.ge.com> Bruce G. Barnett replies:
> There are times when I want the first occurrence of a pattern without
> reading the entire (i.e. HUGE) file.

If we're talking about finding subject lines in news articles:
	head -20 file1 file2 ... | grep ^Subject:

> Or there are times when I want the first occurrence of a pattern from
> hundreds of files, but I don't want to see the pattern more than once.

In this case, the original suggestion seems appropriate:
	grep pattern file1 file2 ... | head -1
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200/2400, sysop@1:163/8

guy@gorodish.Sun.COM (Guy Harris) (06/08/88)

> >7th Edition grep had a -h flag to not print the filenames on a grep.
> >4BSD still has a -h flag.
> >System 5 doesn't have a -h flag.
> >(Another example of how System 5 is superior to BSD... and V7...)
> 
> Maybe the AT&T folks figured that their customers were smart enough
> to type "cat files ... | grep".

*Which* "AT&T folks"?  The folks at AT&T Bell Labs Research were the ones who
put the "-h" flag into "grep" in the first place, *not* the ones at Berkeley.

> I've never had the need for a -h flag, but I sure would like for the -H
> (ALWAYS print filename) option to be the default instead of the current
> variable algorithm.

Maybe the AT&T folks figured that their customers were smart enough to type
"grep ... /dev/null"?

oz@yunexus.UUCP (Ozan Yigit) (06/08/88)

In article <7939@alice.UUCP> andrew@alice.UUCP writes:
>
>many people have written about patterns matching multiple lines.
>grep will not do this. if you really need this, use sam by rob pike
>as described in the nov 1987 software practice and experience.
>
	Why should this not be done by grep ??? I think Rob Pike's
	"Structured Expressions" is the way to go for a modern grep,
	where newline spanning is supported, and the program does
	not die unexpectedly just because a file contains a line too
	long for a stupid internal "line size". (For an insightful
	discussion of this, interested readers could check out Rob's
	paper in EUUG proceedings.)

oz
-- 
The deathstar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense sunbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

guy@gorodish.Sun.COM (Guy Harris) (06/09/88)

> No, the obvious thing to do is:
> 
> nm -o _memcpy *.o

"Obvious" under which version of UNIX?  From the 4.3BSD manual:

	-o	Prepend file or archive element name to each output line
		rather than only once.

The SunOS manual page says the same thing.

From the S5R3 manual:

	-o	Print the value and size of a symbol in octal instead of
		decimal.

With the 4.3BSD version you can do

	nm -o *.o | egrep _memcpy

and get the result you want.  For any version of "nm" that I know of, you can
do the "egrep" trick mentioned in another posting; you may have to use a flag
such as "-p" with the S5 version to get "easily parsable, terse output."

john@frog.UUCP (John Woods) (06/09/88)

Hypothesize for the moment that I would like to have the Subject: lines for
each article in /usr/spool/news/comp/sources/unix.  Many people have proposed
a new flag for the "new grep" (one that functions just like the -one flag does
on "match", the matching program I use (a flag I implemented long ago)).

In article<5007@sdcsvax.UCSD.EDU>,hutch@net1.ucsd.edu(Jim Hutchison) suggests:
> grep pattern file1 ... fileN | head -1
> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.  This would result in relatively little
> of the file actually being read under most Unix implementations.

Yes, it would result in relatively little of the file being read.  It would
also result in relatively little of the desired output.  Check the problem
space before posting solutions, folks.

As I pointed out in another message, you can get awk to solve the problem
almost exactly, with some irregularity in the NFILES={0,1} cases.  However,
the "tool-using" approach is a two-edged sword, it seems to me:  a matching
problem should be solvable by using the matching tool, not by a special case
of an editor tool (the purported "sed" solution) or by having to reach for
a full-blown programming language (awk); just as one should not paginate
a text file by using the /PAGINATE /NOPRINT features of a line-printer
program...  Sometimes you need to EN-feature a program in order to avoid
having to turn to (other) inappropriate tools.  "Oh, you can't ADD text
with this editor, only change existing text.  You add text by using
'cat >> filename' ..."

I like the "context" tool suggested elsewhere, but it has one problem (as
stated) for replacing context diffs:  context diffs are both context and
_differences_, and are generally clearly marked as such (i.e., the !+-
convention); while I guess you could turn an ed-script style diff listing
into a context diff (given both input files and the diff marks), that is
a radically different input language than that proposed for eliminating
context grep.  This just means, however, that two context tools are needed,
not just one.

To paraphrase Einstein, "Programs should be as simple as possible, and no
simpler."
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

john@frog.UUCP (John Woods) (06/09/88)

In article <1998@u1100a.UUCP>, krohn@u1100a.UUCP (Eric Krohn) writes:
> In article <1112@X.UUCP> john@frog.UUCP (some clown :-) writes:
> ] 	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *
> 
> This will print Subject: lines more than once per file if a file happens to
> have more than one Subject: line.  `Next' goes to the next input line, not
> the next input file, so you are still left with an exhaustive search of all
> the files.
> 
Oops.  I blew it.  Working on GNU awk seems to have permanently damaged my
brain (there are a couple of differences between "real" awk and GNU awk which
I couldn't convince the author were worth changing, specifically in 'exit'
(not next); GNU exit actually does what I thought next would do, instead of
exiting entirely).  

-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

maujd@warwick.UUCP (Geoff Rimmer) (06/10/88)

In article <2450011@hpsal2.HP.COM> morrell@hpsal2.HP.COM (Michael Morrell) writes:
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

How about

	grep -n pattern file | sed "s/:.*//"

?
	------------------------------------------------------------
	Geoff Rimmer, Computer Science, Warwick University, UK.
	maujd@uk.ac.warwick.opal
	------------------------------------------------------------

vanam@pttesac.UUCP (Marnix van Ammers) (06/10/88)

In article <4524@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes:

>There have been times when I wanted a grep that would print out the
>first occurrence and then stop.

sed -n -e "/<pattern>/ { p" -e q -e "}"