[comp.unix.questions] grep replacement

andrew@alice.UUCP (05/23/88)

	Al Aho and I are designing a replacement for grep, egrep and fgrep.
The question is what flags should it support and what kind of patterns
should it handle? (Assume the existence of flags to make it compatible
with grep, egrep and fgrep.)
	The proposed flags are the V9 flags:
-f file	pattern is (`cat file`)
-v	print nonmatching
-i	ignore aphabetic case
-n	print line number
-x	the pattern used is ^pattern$
-c	print count only
-l	print filenames only
-b	print block numbers
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-s	no output; just status
-e expr	use expr as the pattern

The patterns are as for egrep, supplemented by back-referencing
as in \{pattern\}\1.

please send your comments about flags or patterns to research!andrew

papowell@attila.uucp (Patrick Powell) (05/25/88)

In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>
>	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>The question is what flags should it support and what kind of patterns
>should it handle? (Assume the existence of flags to make it compatible
>with grep, egrep and fgrep.)
>
>please send your comments about flags or patterns to research!andrew

The one thing I miss about grep families is the ability to have
a named search pattern. For example:

DIGIT= \{[0-9]\}
ALPHA=\{[a-zA-Z]\}
\${ALPHA}\${PATTERN}

This would sort of make sense.

The other facility is to find multiple line patterns, as in:
find the pair of lines that have pattern1 in the first line
pattern2 in the second, etc.

This I have needed sooo many times;  I have ended up using AWK
and a clumsy set of searches.

For example:
\#{1 p}Pattern
\#{2}Pattern
This could print out lines that match,  or only the first line
(1p->print this one only).

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

ljz@fxgrp.UUCP (Lloyd Zusman) (05/26/88)

In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
  In article <7882@alice.UUCP> andrew@alice.UUCP writes:
  >
  >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
  >The question is what flags should it support and what kind of patterns
  >should it handle? ...

  ...

  The other facility is to find multiple line patterns, as in:
  find the pair of lines that have pattern1 in the first line
  pattern2 in the second, etc.
  
  This I have needed sooo many times;  I have ended up using AWK
  and a clumsy set of searches.
  
  For example:
  \#{1 p}Pattern
  \#{2}Pattern
  This could print out lines that match,  or only the first line
  (1p->print this one only).

  ...

Or another way to get this functionality would be for this new greplike
thing to allow matches on the newline character.  For example:

    ^.*foo\nbar.*$
          ^^
    	newline

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) (05/26/88)

One thing I would _love_ is to be able to find the context of what I've
found, for example, to find the two (n?) surrounding lines.  I have wanted
to do this many times and there is no good way.

	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
			elroy!alan@csvax.caltech.edu	   could go wrong?"

ben@idsnh.UUCP (Ben Smith) (05/26/88)

I also would like to see more of the lex capabilities in grep.
-- 
Integrated Decision Systems, Inc.    | Benjamin Smith - East Coast Tech. Office
The fitting solution in professional | Peterborough, NH
portfolio management software.       | UUCP: uunet!idsnh!ben

kutz@bgsuvax.UUCP (Kenneth Kutz) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
  
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
  
There is a program on the Usenix tape under .../Utilities/Telephone
called 'tele'.  If you call the program using the name 'g', it
supports displaying of context.  E-mail me if you want more info.



-- 
--------------------------------------------------------------------
      Kenneth J. Kutz         	CSNET kutz@bgsu.edu
				UUCP  ...!osu-cis!bgsuvax!kutz
 Disclaimer: Opinions expressed are my own and not of my employer's
--------------------------------------------------------------------

dcon@ihlpe.ATT.COM (452is-Connet) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon

david@elroy.Jpl.Nasa.Gov (David Robinson) (05/27/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
 
> Also, what line number it was found on.
 


How about "grep -n"?



-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

daveb@laidbak.UUCP (Dave Burton) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
|Also, what line number it was found on.

Already there: grep -n.

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

Please. Maybe "grep -k" where k is any integer giving the number of lines
of context on each side of grep, default is 0. Oh, but hey, _you're_ designing
it! :-)
-- 
--------------------"Well, it looked good when I wrote it"---------------------
 Verbal: Dave Burton                        Net: ...!ihnp4!laidbak!daveb
 V-MAIL: (312) 505-9100 x325            USSnail: 1901 N. Naper Blvd.
#include <disclaimer.h>                          Naperville, IL  60540

dcon@ihlpe.ATT.COM (452is-Connet) (05/27/88)

In article <6877@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
>> Also, what line number it was found on.
>How about "grep -n"?
>


Embarassed and red-faced he goes away to read the man-page...

stan@sdba.UUCP (Stan Brown) (05/27/88)

> 
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
> 
> 	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
> 			elroy!alan@csvax.caltech.edu	   could go wrong?"


	Along this same general line it would be nice to be abble to
	look for paterns that span lines.  But perhaps this would be
	tom complete a change in the philosophy of grep ?

	stan


-- 
Stan Brown	S. D. Brown & Associates	404-292-9497
(uunet gatech)!sdba!stan				"vi forever"

jas@rain.rtech.UUCP (Jim Shankland) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
>In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines....
>
>Also, what line number it was found on.

You've already got the line number with the "-n" option.  Note that that makes
it easy to write a little wrapper script that gives you context grep.
Whether that's preferable to adding the context option to grep is, I suppose,
debatable; but I can already see the USENIX paper:

	"newgrep -[whatever] Considered Harmful"

Jim Shankland
  ..!ihnp4!cpsc6a!\
               sun!rtech!jas
 ..!ucbvax!mtxinu!/

aperez@cvbnet2.UUCP (Arturo Perez Ext.) (05/27/88)

From article <662@fxgrp.UUCP>, by ljz@fxgrp.UUCP (Lloyd Zusman):
> In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
>   In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>   >
>   >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   >The question is what flags should it support and what kind of patterns
>   >should it handle? ...

Actually, I agree with the guy who posted a request shortly before this
came out.

The most useful feature that is currently lacking is the ability to
do context greps, i.e. greps with a window.  There are two ways this could be
handled.   One is to allow awk-like constructs specifying beginning and 
ending points for a window.  Sort of like, e.g.

	grep -w '/:/,/^$/' file

which would find the lines between each pair of a ':' containing line and
the next following blank line.  The other way would be to have a simple
"number of lines around match" parameter, possibly with collapse of overlapping
windows.  Then you could say

	grep -w 5 foo file

which would print 2 lines above and below the matching line.  Either way
it's done would be nice.  I have made one attempt to implement this
with a script and it wasn't too much fun...

Arturo Perez
ComputerVision, a division of Prime

morrell@hpsal2.HP.COM (Michael Morrell) (05/28/88)

/ hpsal2:comp.unix.questions / dcon@ihlpe.ATT.COM (452is-Connet) /  6:36 am  May 26, 1988 /

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon
----------

grep -n does this, but I'd like to see an option which ONLY prints the line
numbers where the pattern was found.

  Michael Morrell
  hpda!morrell

bzs@bu-cs.BU.EDU (Barry Shein) (05/28/88)

Re: grep with N context lines shown...

Interesting, that's very close to a concept of a multi-line record
grep where I treat N lines as one and any occurrance results in a
listing. The difference is the line chosen to count from (in a context
the match would probably be middle and +-N, in a record you'd just
list the record.)

Just wondering if a generalization is being missed here somewhere,
also consider grepping something like a termcap file, maybe what I
really want is a generalized method to supply pattern matchers for
what to list on a hit:

	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
					# beginning with white space to
					# one past current line

Of course, that destroys the stream nature of grep, it has to be able
to arbitrarily back up, ugh, although "last candidate for a start"
could be saved on the fly. The nice thing is that it can use
(essentially) the same pattern machinery for choosing printing (I
know, have to add in the notion of dot etc.)

I dunno, food for thought, like I said, maybe there's a generalization
here somewhere. Or maybe grep should just emit line numbers in a form
which could be post-processed by sed for fancier output (grep in
backquotes on sed line.) Therefore none of this is necessary :-)

	-Barry Shein, Boston University

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/28/88)

[mail bounced]

There have been times when I wanted a grep that would print out the
first occurrence and then stop.

-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

wyatt@cfa.harvard.EDU (Bill Wyatt) (05/28/88)

> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

grep '(your_pattern_here)' | head -1
-- 

Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
Wyatt   ARPA:  wyatt@cfa.harvard.edu
         (or)  wyatt%cfa@harvard.harvard.edu
      BITNET:  wyatt@cfa2
        SPAN:  cfairt::wyatt 

ado@elsie.UUCP (Arthur David Olson) (05/28/88)

> > There have been times when I wanted a grep that would print out the
> > first occurrence and then stop.
> 
> grep '(your_pattern_here)' | head -1

Doesn't cut it for

	grep '(your_pattern_here)' firstfile secondfile thirdfile ...
-- 
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

roy@phri.UUCP (Roy Smith) (05/28/88)

wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[as a way to get just the first occurance of pattern]
> grep '(your_pattern_here)' | head -1

	Yes, it'll certainly work, but I think it bypasses the original
intention; to save CPU time.  If I had a 1000 line file with pattern on
line 7, I want grep to read the first 7 lines, print out line 7, and exit.
grep|head, on the other hand, will read and search all 1000 lines of the
file; it won't exit (with a EPIPE) until it writes another line to stdout
and finds that head has already exited.  In fact, if grep block-buffers its
output, it may never do more than a single write(2) and never notice that
head has exited.

	Anyway, I agree with the "find first match" flag being a good idea.
It would certainly speed up things like

	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

where I know that the pattern is going to be matched in the first few lines
and don't want to bother searching the rest of the multi-killoline file.
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

chip@vector.UUCP (Chip Rosenthal) (05/29/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> grep '(your_pattern_here)' | head -1
>Doesn't cut it for
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

nor if you want to see if a match was found by testing the exit status
-- 
Chip Rosenthal /// chip@vector.UUCP /// Dallas Semiconductor /// 214-450-0400
{uunet!warble,sun!texsun!rpp386,killer}!vector!chip
I won't sing for politicians.  Ain't singing for Spuds.  This note's for you.

guy@gorodish.Sun.COM (Guy Harris) (05/29/88)

> grep -n does this, but I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.

I wouldn't - if you're only grepping one file, you can do it without such an
option:

	grep -n <pattern> <file> | sed -n 's/\([0-9]*\):.*/\1/p'

If you're grepping more than one file, you obviously have to decide what you
want to do with the file name and the line number; once you do, just change the
"sed" pattern appropriately (and note that if the list of files is variable,
you either have to stick "/dev/null" in there to make sure the names are
generated even if there's only one file or have the script distinguish between
the one-file and >1-file cases; I seem to remember some indication that the new
BTL research "grep" would have a flag to tell it always to give the file name).

russ@groucho.ucar.edu (Russ Rew) (05/30/88)

I also recently had a need for printing multi-line "records" in which a
specified pattern appeared somewhere in the record.  The following
short csh script uses the awk capability to treat whole lines as fields
and empty lines as record separators to print all the records from
standard input that contain a line matching a regular specified as an
argument:

#!/bin/csh -f
awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '


     Russ Rew * UCAR (University Corp. for Atmospheric Research)
	 PO Box 3000 * Boulder, CO  80307-3000 * 303-497-8845
	     russ@unidata.ucar.edu * ...!hao!unidata!russ

joey@tessi.UUCP (Joe Pruett) (05/31/88)

>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

This works, but is quite slow if the input to grep is large.  A hack
I've made to egrep is a switch of the form -<number>.  This causes only
the first <number> matches to be printed, and then the next file is
searched.  This is great for:

egrep -1 ^Subject *

in a news directory to get a list of Subject lines.

jqj@uoregon.uoregon.edu (JQ Johnson) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>grep '(your_pattern_here)' | head -1
This is, of course, unacceptable if you are searching a very long file
(say, a census database) and have LOTS of pipe buffering.

Too bad it isn't feasible to have a shell that can optimize pipelines.

dan@maccs.UUCP (Dan Trottier) (05/31/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> > There have been times when I wanted a grep that would print out the
>> > first occurrence and then stop.
>> 
>> grep '(your_pattern_here)' | head -1
>
>Doesn't cut it for
>
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

This is getting ridiculous and can be taken to just about any level...

	foreach i (file1 file2 ...)
	   grep 'pattern' $i | head -1
	end

-- 
       A.I. - is a three toed sloth!        | ...!uunet!mnetor!maccs!dan
-- Official scrabble players dictionary --  | dan@mcmaster.BITNET

leo@philmds.UUCP (Leo de Wit) (05/31/88)

In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes:
>I also recently had a need for printing multi-line "records" in which a
>specified pattern appeared somewhere in the record.  The following
>short csh script uses the awk capability to treat whole lines as fields
>and empty lines as record separators to print all the records from
>standard input that contain a line matching a regular specified as an
>argument:
>
>#!/bin/csh -f
>awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '
>
>

Awk is a nice solution, but sed is a much faster one. I've been following 
the 'grep' discussion for some time now, and have seen much demand for
features that are simply within sed. Here are some; I have left the discussion
about the function of this or that sed-command out: there is a sed article and
a man page...

Patrick Powell writes:
>The other facility is to find multiple line patterns, as in:
>find the pair of lines that have pattern1 in the first line
>pattern2 in the second, etc.

Try this one:

        sed -n -e '/PATTERN1/,/PATTERN2/p' file

It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can
have subcommands to do special things (with '{' I mean).


Alan (..!cit-vax!elroy!alan) writes:
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

There is. Try this one:

        sed -n -e '
/PATTERN/{
x
p
x
p
n
p
}
h' file

It prints the line before, the line containing the PATTERN, and the line after.
Of course you can make the output fancier and the number of lines printed
larger.


David Connet writes:
>>
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines.  I have wanted
>>to do this many times and there is no good way.
>Also, what line number it was found on.

Sed can also handle this one:

        sed -n -e '/PATTERN/=' file


Lloyd Zusman writes:
>Or another way to get this functionality would be for this new greplike
>thing to allow matches on the newline character.  For example:
>    ^.*foo\nbar.*$
>          ^^
>    	newline

Sed can match on embedded newline characters in the substitute command 
(it is indeed \n here!). The trailing newline is matched by $.


Barry Shein writes [story about relative addressing]:
>I dunno, food for thought, like I said, maybe there's a generalization
>here somewhere. Or maybe grep should just emit line numbers in a form
>which could be post-processed by sed for fancier output (grep in
>backquotes on sed line.) Therefore none of this is necessary :-)

Quite right. I think most times you want to see the context it is in 
interactive use. In that case you can write a simple sed-script that does
just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines
, where N is a constant. The example I gave for N == 1 can be extended for
larger N, with fancy output etc.


Bill Wyatt writes: 
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

Much simpler, and faster:

        sed -n -e '/PATTERN/{
p
q
}' file

Sed quits immediately after finding the first match. You could even create an 
alias for something like that.


Michael Morrell writes:
>>Also, what line number it was found on.
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

The sed trick does this:

        sed -n -e '/PATTERN/=' file

Or you could even:

        sed -n -e '/PATTERN/{
=
q
}' file

which prints the first matched line number and exits.


Roy Smith writes:
>wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>[as a way to get just the first occurance of pattern]
>> grep '(your_pattern_here)' | head -1
>	Yes, it'll certainly work, but I think it bypasses the original
>intention; to save CPU time.  If I had a 1000 line file with pattern on
>line 7, I want grep to read the first 7 lines, print out line 7, and exit.
>grep|head, on the other hand, will read and search all 1000 lines of the
>file; it won't exit (with a EPIPE) until it writes another line to stdout
>and finds that head has already exited.  In fact, if grep block-buffers its
>output, it may never do more than a single write(2) and never notice that
>head has exited.

Quite right. The sed-solution I mentioned before is fast and neat. In fact, 
who needs head:

        sed 10q

does the job, as you can find in a book of Kernigan and Pike, I thought the 
title was 'the Unix Programming Environment'.


Stan Brown writes:
>	Along this same general line it would be nice to be abble to
>	look for paterns that span lines.  But perhaps this would be
>	tom complete a change in the philosophy of grep ?

As I mentioned before, embedded newlines can be matched by sed in the
substitute command.


What I also see often is things like

        grep 'pattern' file | sed 'expression'

A pity a lot of people don't know that sed can do the pattern matching itself.

S. E. D. (Sic Erat Demonstrandum)


As far as options for a new grep are conceirned, I suggest to use the options
proposed (and no more). Let other tools handle other problems - that's in the
Un*x spirit. What I would appreciate most in a new grep is:
no more grep, egrep, fgrep, just one tool that can be both fast (for fixed
strings) and elaborate (for pattern matching like egrep). The 'bm' tool that
was on the net (author Peter Bain) is very fast for fixed strings, using the
Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...?


        Leo.

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
|
|> There have been times when I wanted a grep that would print out the
|> first occurrence and then stop.
|
|grep '(your_pattern_here)' | head -1

Yes I have tried that. You are missing the point.

Have you ever waited for a computer?  

There are times when I want the first occurrence of a pattern without
reading the entire (i.e. HUGE) file.

Or there are times when I want the first occurrence of a pattern from
hundreds of files, but I don't want to see the pattern more than once.

And yes I know how to write a shell script that does this.

IMHO (sarcasm mode on), it is more efficient to call grep 
once for one hundred files, than to call (grep $* /dev/null|head -1) 
one hundred times. 
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

gwc@root.co.uk (Geoff Clare) (05/31/88)

Most of the useful things people have been saying they would like to be
able to do with 'grep' can already be done very simply with 'sed'.
For example:

    Stop after first match:   sed -n '/pattern/{p;q;}'

    Match over two lines:     sed -n 'N;/pat1\npat2/p;D'

It should also be possible to get a small number of context lines by
judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
tried it.  Anyway, this can be done with a normal line editor (if the data
to be searched aren't coming from a pipe) with 'g/pattern/-,+p'.

I was rather alarmed to see the proposal for 'pattern repeat' in the original
article was '\{pattern\}\1' rather than '\(pattern\)\1', as the latter is
already used for this purpose in the standard editors (ed, ex/vi, sed).
Or was it a typo?

By the way, does anyone know why the ';' command terminator in 'sed' is
not documented?  It works on all the systems I've tried it on, but I
have never found it in any manuals.  It's so much nicer than putting
the commands on separate lines, or using multiple '-e' options.
-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-606-7799  FAX: +44-1-726-2750

andyc@omepd (T. Andrew Crump) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:

>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.

   >
   >grep '(your_pattern_here)' | head -1

Yes, but it forces grep to search a whole file, when what you may have wanted
was at the beginning.  This is inefficient if the "file" is large.

A more general version of this request would be a parameter that would restrict
grep to n or less occurrences, maybe 'grep -N #'.

-- Andy Crump

trb@ima.ISC.COM (Andrew Tannenbaum) (05/31/88)

> I seem to remember some indication that the new BTL research "grep"
> would have a flag to tell it always to give the file name).

I have always wanted to be able to tell grep to NOT print the file
names on a multi-file grep.  Let's say I want a phone number script -
usually a simple grep - but if I want to store the numbers in multiple
files (e.g. mine and my departments), then the output contains
unsightly filenames.  This has always struck me as opposite to the UNIX
philosophy of having a filter provide output that is useful as data.

I would like the option to go to the next file after first match
(regardless of which other options are present).  Also, I would like to
print a region other than line on a match.  It would be nice to delimit
the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others)
would be useful.

	Andrew Tannenbaum   Interactive   Boston, MA   +1 617 247 1155

avr@mtgzz.UUCP (XMRP50000[jcm]-a.v.reed) (05/31/88)

In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes:
> Also, what line number it was found on.
> grep -n does this, but I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.

# In ksh's $ENV - otherwwise use a shell script:
function lngrep {
		if [ "$#" = '1' ]
			then grep -n $@ | cut -f1 -d:
			else grep -n $@ | cut -f1,2 -d:
		fi
		}
					Adam Reed (mtgzz!avr)

glennr@holin.ATT.COM (Glenn Robitaille) (06/01/88)

> > > There have been times when I wanted a grep that would print out the
> > > first occurrence and then stop.
> > 
> > grep '(your_pattern_here)' | head -1
> 
> Doesn't cut it for
> 
> 	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

Well, if you have a shell command like

	#
	# save the search patern
	#
	patern=$1
	#
	# remove search patern from $*
	#
	shift
	for i in $*
	do
		#
		# grep for search patern
		#
		line=`grep ${patern} ${i}|head -1`
		#
		# if found, print file name and string
		#
		test -n "$line" && echo "${i}:\t${line}"
	done

It'll work fine.  If you want to use other options, have them in
quotes as part of the first argument.


Glenn Robitaille
AT&T, HO 2J-207
ihnp4!holin!glennr
Phone (201) 949-7811

aeb@cwi.nl (Andries Brouwer) (06/01/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

A fast way of searching for the first occurrence is really useful.
I have a version of grep called `contains', and a shell script
for formatting that says: if the input contains .[ then use refer;
if it contains .IS then ideal; if .PS then pic; if .TS then tbl, etc.

-- 
      Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl

jin@hplabsz.HPL.HP.COM (Tai Jin) (06/01/88)

In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes:
>I have always wanted to be able to tell grep to NOT print the file
>names on a multi-file grep.  Let's say I want a phone number script -
>usually a simple grep - but if I want to store the numbers in multiple
>files (e.g. mine and my departments), then the output contains
>unsightly filenames.  This has always struck me as opposite to the UNIX
>philosophy of having a filter provide output that is useful as data.

Actually, I think the Unix philosophy is to have simple filters and use
pipes to construct more complex filters.  Unfortunately, you can't do
everything with pipes.

>I would like the option to go to the next file after first match
>(regardless of which other options are present).  Also, I would like to
>print a region other than line on a match.  It would be nice to delimit
>the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others)
>would be useful.

I also would like a context grep that greps for records with arbitrary
delimiters.  I started working on this, but I've had no time to finish it.

...tai

hasch@gypsy.siemens-rtl (Harald Schaefer) (06/01/88)

If you are only interested in the first occurence of a pattern, you can use
something like
	sed -n '/<pattern>/ {
		p
		q
		}' file
Harald Schaefer
Siemens Corp. - RTL
Bus. Phone (609) 734 3389
Home Phone (609) 275 1356

uucp:	...!princeton!gypsy!hasch
	hasch@gypsy.uucp
ARPA:	hasch@siemems.com
	hasch%siemens@princeton.EDU

aburt@isis.UUCP (Andrew Burt) (06/01/88)

I'd like to see the following enhancements in a grepper:

	-  \< and \> to match word start/end as in vi, with -w option
		as in BSD grep to match pattern as a word.

	- \w in pattern to match whitespace (generalization: define
		\unused-letter as a pattern; or allow full lex capability).

	- way to invert piece of pattern such as: grep foo.*\^bar\^xyzzy
		with meaning as in: grep foo | grep -v bar | grep -v xyzzy
		(or could be written grep foo.*\^(bar|xyzzy) of course).

	-  Select Nth occurrence of match (generalization: list of
		matches to show: grep -N -2,5-7,10- ... to grab up to the 2nd,
		5th through 7th, and from the 10th onward).

	- option to show lines between matches (not just matching lines)
		as in: grep -from foo -to bar ... meaning akin to
		sed/ed's /foo/,/bar/p.  (But much more useful with other
		extensions).

	- Allow matching newlines in a "binary" (or non-text) sort of mode:
		grep -B 'foo.*bar'  finds foo...bar even if they are
		not on the same line.  (But printing the "line" that
		matches wouldn't be useful anymore, so just printing the
		matched text would be better.  Someone wanting lines could
		look for \n[^\n]*foo.*bar[^\n]*\n, though a syntax to
		make this easier might be in order.  Perhaps this wouldn't
		be an example of a binary case -- but a new character
		with meaning like '.' but matching ANY character would work:
		if @ is such a character then "grep foo@*bar".   Perhaps
		a better example, assuming the \^ for inversion syntax
		above would be "grep foo@*(\^bar)bar -- otherwise it would
		match from first foo to last bar, while I might want from
		first foo to first bar.)

	- provide byte offset of start of match (like block number or
		line number) useful for searching non-text files.

	- Provide a lib func that has the RE code in it.

	- Install RE code in other programs: awk/sed/ed/vi etc.
		Oh for a standardized RE algorithm!
-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

jjg@linus.UUCP (Jeff Glass) (06/01/88)

In article <470@q7.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
> >grep '(your_pattern_here)' | head -1
> 
> This works, but is quite slow if the input to grep is large.  A hack
> I've made to egrep is a switch of the form -<number>.  This causes only
> the first <number> matches to be printed, and then the next file is
> searched.  This is great for:
> 
> egrep -1 ^Subject *
> 
> in a news directory to get a list of Subject lines.

Try:

	sed -n -e '/pattern/{' -e p -e q -e '}' filename

This prints the first occurrence of the pattern and then stops searching
the file.  The generalizations for printing the first <n> matches and
searching <m> files (where n,m > 1) are more awkward (no pun intended)
but are possible.

/jeff

brianm@sco.COM (Brian Moffet) (06/01/88)

In article <4537@vdsvax.steinmetz.ge.com> barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|grep '(your_pattern_here)' | head -1
>
>Or there are times when I want the first occurrence of a pattern from
>hundreds of files, but I don't want to see the pattern more than once.
>

Have you tried sed?  How about 

$ sed -n '/pattern/p;/pattern/q' file

???



-- 
Brian Moffet		brianm@sco.com  {uunet,decvax!microsof}!sco!brianm
The opinions expressed are not quite clear and have no relation to my employer.
'Evil Geniuses for a Better Tommorrow!'

anw@nott-cs.UUCP (06/01/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer)
writes:
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

	See below.  Does n == 4, but easily changed.

In article <590@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.

	Which is not to say that they shouldn't also be in "*grep"!

>	[ good examples omitted ]
>
> It should also be possible to get a small number of context lines by
> judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
> tried it.  [ ... ]

	The following is "/usr/bin/kwic" on this machine (PDP 11/44 running
V7).  I wrote it about three years ago in response to a challenge from some
AWK zealots;  it runs *much* faster than the equivalent AWK script.  That
is, it is sloooww rather than ssllloooooowwww.  I have a manual entry for
it which is too trivial to send.  Bourne shell, of course.  Use at whim
and discretion.  Several minor bugs, mainly (I hope!) limitations of or
between "sh" and "sed".  (Note that the various occurrences of multiple
spaces in "s..." commands are all TABs, in case mailers/editors/typists
have mangled things.)

> By the way, does anyone know why the ';' command terminator in 'sed' is
> not documented?  It works on all the systems I've tried it on, but I
> have never found it in any manuals.  It's so much nicer than putting
> the commands on separate lines, or using multiple '-e' options.

	No, I don't know why, but it isn't the only example in Unix of a
facility most easily discovered by looking in the source.  I've occasionally
used it, but I tried re-writing the following that way, and it *didn't* look
so much nicer;  in fact it looked 'orrible.

--------------------------------- [cut here] -----------------------------
[ $# -eq 0 ] && { echo "Usage: $0 pattern [file] ..." 1>&2; exit 1; }

l='[^\n]*\n' pat="$1" shift

exec sed -n   "/$pat"'/ b found
			s/^/	/
			H
			g
      /^'"$l$l$l$l$l"'/ s/\n[^\n]*//
			h
			b
	: found
			s/^/++	/
			H
			g
			s/.//p
			s/.*//
			h
	: loop
		      $ b out
			n
	     /'"$pat"'/ b found
			s/^/	/
			H
			g
	/^'"$l$l$l$l"'/ !b loop
	: out
			s/.//p
			s/.*/-----------------/
			h
	    ' ${1+"$@"}

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

andrew@alice.UUCP (06/01/88)

in my naivity, i had not been following netnews closely
after i posted the original ``grep replacement'' article.
I assumed that people would reply to me, not the net.
That is the reason i have not been participating in the discussion.
i will be posting my resolution of the suggestions shortly.

many people have written about patterns matching multiple lines.
grep will not do this. if you really need this, use sam by rob pike
as described in the nov 1987 software practice and experience.
the code is available for a plausible fee from the at&t toolchest.

sef@csun.UUCP (Sean Fagan) (06/02/88)

Something I'd like to see is this: grep '^<somepattern>$^<morepatterns>$...'.
While this would, of course, not be trivial, I think it would probably be
more general (and therefore more in the "spirit" of Unix(tm)) than showing n
lines around a matched pattern.
But that's just my opinion.

-- 
Sean Fagan  (818) 885-2790   uucp:   {ihnp4,hplabs,psivax}!csun!sef
CSUN Computer Center         BITNET: 1GTLSEF@CALSTATE
Northridge, CA 91330         DOMAIN: sef@CSUN.EDU
"I just build fast machines."  -- S. Cray

jfh@rpp386.UUCP (John F. Haugh II) (06/02/88)

In article <2117@uoregon.uoregon.edu> jqj@drizzle.UUCP (JQ Johnson) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>>> There have been times when I wanted a grep that would print out the
>>> first occurrence and then stop.
>>grep '(your_pattern_here)' | head -1
>This is, of course, unacceptable if you are searching a very long file
>(say, a census database) and have LOTS of pipe buffering.
>
>Too bad it isn't feasible to have a shell that can optimize pipelines.

there is a boyer/moore based fast grep in the archives.  adding an
additional option (say '-f' for first in each file?) should be quite
simple.

perhaps i'll post the diff's if i remember to go hack on the sucker
any time soon.

- joh.
-- 
John F. Haugh II                 | "If you aren't part of the solution,
River Parishes Programming       |  you are part of the precipitate."
UUCP:   ihnp4!killer!rpp386!jfh  | 		-- long since forgot who
DOMAIN: jfh@rpp386.uucp          | 

john@frog.UUCP (John Woods) (06/02/88)

In article <590@root44.co.uk>, gwc@root.co.uk (Geoff Clare) writes:
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.
> For example:
>     Stop after first match:   sed -n '/pattern/{p;q;}'

Close, but no cigar.  It does not work for multiple input files.
(And, of course, spawning off a new sed for each file defeats the basic desire
of most of the people who've asked for it:  speed)

However,

	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *

does (just about) work.  And it's probably not _obscenely_ slow.
(it doesn't behave for no input files, and you might prefer no FILENAME: for
just a single input file)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

les@chinet.UUCP (Leslie Mikesell) (06/02/88)

In article <2018@hplabsz.HPL.HP.COM> jin@hplabsz.UUCP (Tai Jin) writes:
>>I have always wanted to be able to tell grep to NOT print the file
>>names on a multi-file grep. 
>
>Actually, I think the Unix philosophy is to have simple filters and use
>pipes to construct more complex filters.  Unfortunately, you can't do
>everything with pipes.



In this case it can be done with pipes:

  cat file.. |grep pattern


 Les Mikesell

mdorion@cmtl01.UUCP (Mario Dorion) (06/03/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
> >
> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
> 
> Also, what line number it was found on.
> 
> David Connet
> ihnp4!ihlpe!dcon

Ever tried grep -n ?????

There are three features I would like to see in a grep-like program:

1- Be able to use a newline character in the regular expression
       grep 'this\nthat' file 

2- Be able to grep more than one regular expression with one call. This would
   be faster than issuing many calls since the file would be read only once.

3- To have an option to search only for the first occurence of the pattern.
   Sometimes you KNOW that the pattern is there only once (for example if you
   grep '^Subject:' on news files) and there's just no need to scan the rest of
   the file. When 'grepping' into many files it would return the first occurence
   for each file.

-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

andrew@alice.UUCP (06/03/88)

In article <449@happym.UUCP>, kent@happym.UUCP writes:
> From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
> Unix, and none of us humans will see it until sysVr6, and only then 
> if we are lucky!! 


Context:
	the right thing to do is to write a context program that takes
input looking like "filename:linenumber:goo" and prints whatever context you like.
we can then take this crap out of grep and diff and make it generally available
for use with programs like the C compiler and eqn and so on. It can also do
the right thing with folding together nearby lines. At least one good first
cut has been put on the net but a C program sounds easy enough to do.

Source:
	the software i write is publicly available because it matters to me.
it was a hassle but mk and fio are available to everybody for reasonable cost
(< $125 commercial, nearly free educational). i am trying hard to do the
same for the new grep. it will be in V10, it will be in plan9, and should be
in SVR4 (the joint sun-at&t release).

lyndon@ncc.Nexus.CA (Lyndon Nerenberg) (06/04/88)

In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes:
>I have always wanted to be able to tell grep to NOT print the file
>names on a multi-file grep.

That's easy :-)

Just pipe it through cut(1). Works great unless you have a ':' as part
of the file name...
-- 
{alberta,utzoo,uunet}!ncc!lyndon  lyndon@Nexus.CA

mdorion@cmtl01.UUCP (Mario Dorion) (06/04/88)

In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes:
> 
> (...) I'd like to see an option which ONLY prints the line
> numbers where the pattern was found.
> 
>   Michael Morrell
>   hpda!morrell

You could use the following:

grep -n 'foo' bar | cut -d: -f1


-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

allbery@ncoast.UUCP (Brandon S. Allbery) (06/05/88)

As quoted from <2312@bgsuvax.UUCP> by kutz@bgsuvax.UUCP (Kenneth Kutz):
+---------------
| In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
| > One thing I would _love_ is to be able to find the context of what I've
| > found, for example, to find the two (n?) surrounding lines.  I have wanted
| > to do this many times and there is no good way.
+---------------

	grep -n foo ./bar | context 2

I posted context to net.sources back when it existed; someone may still have
archives from that time, if not I'll retrieve my sources and repost it.  It
takes lines of the basic form

	filename ... linenumber : ...

and displays context around the specified lines.  I use this with grep quite
often; it also works with cc (pcc, not Xenix cc) error messages.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

gwyn@brl-smoke.UUCP (06/05/88)

In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>	the right thing to do is to write a context program that takes
>input looking like "filename:linenumber:goo" and prints whatever context ...

Heavens -- a tool user.  I thought that only Neanderthals were still alive.
I guess Bell Labs escaped the plague.

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|
>|> There have been times when I wanted a grep that would print out the
>|> first occurrence and then stop.
>|
>|grep '(your_pattern_here)' | head -1
>
[...]
>
>Have you ever waited for a computer?  

No, never. :-)

>There are times when I want the first occurrence of a pattern without
>reading the entire (i.e. HUGE) file.

I realize this is dependent on the way in which processes sharing a
pipe act, but this is a point worth considering before we get yet
another annoying burst of "cat -v" type programs.

grep pattern file1 ... fileN | head -1

This should send grep a SIGPIPE as soon as the first line of output
trickles through the pipe.  This would result in relatively little
of the file actually being read under most Unix implementations.
I would agree that it is a bad thing to rely on the granularity of
a pipe.  Here is a sample program which can be used to show you what
I mean.

Name it grep, and use it thus wise:

% ./grep pattern * | head -1

/* ------------- Cut here --------------- */
#include <stdio.h>
#include <signal.h>

sighandler(sig)
    int sig;
{
    if (sig == SIGPIPE)
	fprintf(stderr,"Died from a SIGPIPE\n");
    else
	fprintf(stderr,"Died from signal #%d\n", sig);
    exit(0);
}

main()
{
    signal(SIGPIPE,sighandler);
    for (;;)
	printf("pattern\n");
}
/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

I can think of a few nasty ways to do this one, I am hoping to get
a better answer.

A grep with a window of context around it.  A few lines proceeding and
following the pattern I am looking for.  The VMS search command sported
this as an option/qualifier.  I miss it sometimes (not VMS, just a few
of the more wacky utilities, like the editor option for creation of
multi-key data base files :-).

/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

tbray@watsol.waterloo.edu (Tim Bray) (06/05/88)

Grep should, where reasonable, not be bound by the notion of a 'line'.
As a concrete expression of this, the useful grep -l (prints the names of
the files that contain the string) should work on any kind of file.  More
than one existing 'grep -l' will fail, for example, to tell you which of a 
bunch of .o files contain a given string.  Scenario - you're trying to
link 55 .o's together to build a program you don't know that well.  You're
on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
The source is scattered inconveniently.  The obvious thing to do is: 
grep -l _memcpy *.o
That this often will not work is irritating.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Heavens -- a tool user.  I thought that only Neanderthals were still alive.
>I guess Bell Labs escaped the plague.

Almost, unless the original input was produced by a pipeline, in which
case this (putative) post-processor can't help unless you tee the mess
to a temp file, yup, mess is the right word.

Or maybe only us Neanderthals are interested in tools which work on
pipes? Have they gone out of style?

	-Barry "Ulak of Org" Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

The proposed tool would be very handy on ordinary text files,
but it is hard to see a use for it on pipes.  Or, getting back
to context-grep, what good would it do to show context from a
pipe?  To do anything with the information (other than stare
at it), you'd need to produce it again.  There might be some
use for context-{grep,diff,...} on a stream, but if a separate
context tool will satisfy 99% of the need, as I think it would,
as well as provide this capability for other commands "for free",
it would be a better approach than hacking context into other
commands.

By the way, I hope the new grep when asked to always produce
the filename will use "-" for stdin's name, and the context
tool would also follow the same convention.  Even though the
Research systems have /dev/stdin, other sites may not, and
anyway (as we've just seen) stdin isn't really a definite
object.

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

How about:

alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

or something like that?  Does that offend tool-users sensibilities?
*Do* Neanderthals have any sensibilities?
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>The proposed tool would be very handy on ordinary text files,
>but it is hard to see a use for it on pipes.  Or, getting back
>to context-grep, what good would it do to show context from a
>pipe?  To do anything with the information (other than stare
>at it), you'd need to produce it again.

What else are context displays for except to stare at (or save in a
file for later staring)?

Are the resultant contexts often the input to other programs? (I know
that 'patch' can take a context input but that's irrelevant, it hardly
needs nor prefers a context diff to my knowledge, it's just being
accomodating so humans can look at the context diff if something
botches.)

Actually, I can answer that in the context of the original suggestion.

The motivation for a context comes in two major flavors:

	A) To stare at (the surrounding context gives a human some
	hint of the context in which the text appeared)

	B) Because the context really represents a multi-line (eg)
	record, such as pulling out every termcap or terminfo entry
	which contains some property but desiring the result to contain
	the entire multiline entry so it could be re-used to create a
	new file.

In either case it's independent of whether the data is coming from a
pipe (as it should be.) Its pipeness may be caused by something as
simple as the data being grabbed across the network (rsh HOST cat foo | ...).

Anyhow, I think it's bad in general to demand the reasoning of why a
selection operator should work in a pipe, it just should (although I
have presented a reasonable argument.) That's what tools are all about.

>There might be some
>use for context-{grep,diff,...} on a stream, but if a separate
>context tool will satisfy 99% of the need, as I think it would,
>as well as provide this capability for other commands "for free",
>it would be a better approach than hacking context into other
>commands.

I think claiming that 99% of the use won't need pipes is unsound, it
should just work with a pipe and any tool which requires passing the
file name and then re-positioning the file just won't, it's violating
a fundamental design concept by doing this (not that in rare cases
this might not be necessary, but I don't see where this is one of them
unless you use the circular argument of it "must be a separate
program".)

The reasoning for adding it to grep would be:

	a) Grep already has its finger on the context, it's right
	there (or could be), why re-process the entire stream/file
	just to get it printed? Grep found the context, why find it
	again?

	b) The context suggestions are merely logical generalizations
	of the what grep already does, print the context of a match
	(it just happens to now limit that to exactly one line.) Nothing
	new conceptually is being added, only generalized.

In fact, if I were to write this context-display tool my first thought
would be to just use grep and try to emit unique patterns (a la TAGS
files) which grep can then re-scan. But grep doesn't quite cut it w/o
this little generalization. I think we're going in circles and this
post-processor is nothing more than a special case of grep or perhaps
cat or sed the way it was proposed (why not just generate sed commands
to list the lines if that's all you want?)

Anyhow, at least we're back to the technical issues and away from
calling anyone who disagrees Neanderthals...

	-Barry Shein, Boston University

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>How about:
>
>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>
>or something like that?  Does that offend tool-users sensibilities?
>*Do* Neanderthals have any sensibilities?

I don't understand, the way to avoid having to tee it into temp
files is to tee it into temp files?

Given that sort of solution we can eliminate pipes entirely from unix,
was that your point? That pipes are fundamentally useless and can
always be eliminated via use of intermediate temp files?

It begs the question, burying it in a little syntactic sugar with an
alias command doesn't solve the problem.

	-Barry Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/06/88)

In article <23142@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Anyhow, at least we're back to the technical issues and away from
>calling anyone who disagrees Neanderthals...

Oh, but the latter is much more fun!

Anyway, the fundamental issue seems to be that there are (at least)
two types of external data objects:
	streams -- transient data, takes special effort to capture
	files -- permanent data with an attached name
UNIX nicely makes these appear much the same, but they do have some
inherent differences, and this one-pass versus multi-pass context
discussion has brought out one of them.

There is nothing particularly wrong with the "tee" approach to
turn a stream into a file long enough for whatever work is being
done.  The converse is often done; for example many of my shell
scripts, after parsing arguments, exec a pipeline that starts
	cat $* | ...
in order to ensure a stream input to the rest of the pipeline.

garyo@masscomp.UUCP (Gary Oberbrunner) (06/06/88)

The only change I've ever had to make to the source for grep to make it do
what I want was to make it work with arbitrary-length lines.
I consider not handling long lines (and not complaining about them either)
to be extremely antisocial.  All this other stuff is just window-dressing.
Not that it's bad; one integrated grep with B-M strings, alternation and
inversion operators, and nifty feeping creaturism is great by me.

I usually handle the multi-line-record case by tr'ing all the intermediate
line ends into some unused character, doing my database hackery (grep, awk,
sed, what have you) and then tr'ing back at the end.  This is one reason for
having grep support very long lines.

				As always,

				Gary

----------------------------------------------------------------------------
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp
-- 
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp

bzs@bu-cs.BU.EDU (Barry Shein) (06/06/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>There is nothing particularly wrong with the "tee" approach to
>turn a stream into a file long enough for whatever work is being
>done.  The converse is often done; for example many of my shell
>scripts, after parsing arguments, exec a pipeline that starts
>	cat $* | ...
>in order to ensure a stream input to the rest of the pipeline.

Nothing wrong with it unless you happen to be on a parallel machine as
I am a lot of the time and pipes can run in parallel nicely.

Nyah Nyah, got ya there! PHFZZZZT! I win! I win!

You're right, this is getting ridiculous, we made our points...

Ok everyone, back to arguing which flags should be maintained in cat
and Unix Standardization AKA "West Coast Story" (snap fingers.)

	-Barry Shein, Boston University

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/06/88)

In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>I don't understand, the way to avoid having to tee it into temp
>files is to tee it into temp files?

No.  There is no way to avoid teeing it into a temp file.  Such is
life with pipes.  If you want context then you need to save it.  My
alias is perfectly consistent with the tool-using philosophy.  Yes,
it's a kludge, but that's the only way to save context in a single-stream
pipe philosophy.  I remember reading a paper in which multiple streams
going hither and yon were proposed, but the syntax was gothic at best.
I like being able to say this:

bsd:	sort | with_context grep rfoo | more
sysv:	sort | with_context grep foo | more
	Because sysv doesn't have the r* utilities, of course :-)
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

rick@seismo.CSS.GOV (Rick Adams) (06/07/88)

7th Edition grep had a -h flag to not print the filenames on a grep.

4BSD still has a -h flag.

System 5 doesn't have a -h flag.

(Another example of how System 5 is superior to BSD... and V7...)

---rick

tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (06/07/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

GNU Emacs has a command that will walk you through each match of a
grep run and show you the context around it:

   grep:
   Run grep, with user-specified args, and collect output in a buffer.
   While grep runs asynchronously, you can use the C-x ` command
   to find the text that grep hits refer to.

M-x grep RET to invoke it.  I suspect other Unix Emacs have a similar
feature.

Information on how to obtain GNU Emacs, other GNU software, or the GNU
project itself is available from:

	gnu@prep.ai.mit.edu

enjoy -len

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/07/88)

In article <44366@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes:
>7th Edition grep had a -h flag to not print the filenames on a grep.
>4BSD still has a -h flag.
>System 5 doesn't have a -h flag.
>(Another example of how System 5 is superior to BSD... and V7...)

Maybe the AT&T folks figured that their customers were smart enough
to type "cat files ... | grep".  I've never had the need for a -h
flag, but I sure would like for the -H (ALWAYS print filename)
option to be the default instead of the current variable algorithm.

brianc@cognos.uucp (Brian Campbell) (06/07/88)

In article <4524@vdsvax.steinmetz.ge.com> Bruce G. Barnett writes:
> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

In article <1036@cfa.cfa.harvard.EDU> Bill Wyatt suggests:
> grep '(your_pattern_here)' | head -1

In article <4537@vdsvax.steinmetz.ge.com> Bruce G. Barnett replies:
> There are times when I want the first occurrence of a pattern without
> reading the entire (i.e. HUGE) file.

If we're talking about finding subject lines in news articles:
	head -20 file1 file2 ... | grep ^Subject:

> Or there are times when I want the first occurrence of a pattern from
> hundreds of files, but I don't want to see the pattern more than once.

In this case, the original suggestion seems appropriate:
	grep pattern file1 file2 ... | head -1
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200/2400, sysop@1:163/8

guy@gorodish.Sun.COM (Guy Harris) (06/08/88)

> >7th Edition grep had a -h flag to not print the filenames on a grep.
> >4BSD still has a -h flag.
> >System 5 doesn't have a -h flag.
> >(Another example of how System 5 is superior to BSD... and V7...)
> 
> Maybe the AT&T folks figured that their customers were smart enough
> to type "cat files ... | grep".

*Which* "AT&T folks"?  The folks at AT&T Bell Labs Research were the ones who
put the "-h" flag into "grep" in the first place, *not* the ones at Berkeley.

> I've never had the need for a -h flag, but I sure would like for the -H
> (ALWAYS print filename) option to be the default instead of the current
> variable algorithm.

Maybe the AT&T folks figured that their customers were smart enough to type
"grep ... /dev/null"?

oz@yunexus.UUCP (Ozan Yigit) (06/08/88)

In article <7939@alice.UUCP> andrew@alice.UUCP writes:
>
>many people have written about patterns matching multiple lines.
>grep will not do this. if you really need this, use sam by rob pike
>as described in the nov 1987 software practice and experience.
>
	Why should this not be done by grep ??? I think Rob Pike's
	"Structured Expressions" is the way to go for a modern grep,
	where newline spanning is supported, and the program does
	not die unexpectedly just because a file contains a line too
	long for a stupid internal "line size". (For an insightful
	discussion of this, interested readers could check out Rob's
	paper in EUUG proceedings.)

oz
-- 
The deathstar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense sunbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

guy@gorodish.Sun.COM (Guy Harris) (06/09/88)

> No, the obvious thing to do is:
> 
> nm -o _memcpy *.o

"Obvious" under which version of UNIX?  From the 4.3BSD manual:

	-o	Prepend file or archive element name to each output line
		rather than only once.

The SunOS manual page says the same thing.

From the S5R3 manual:

	-o	Print the value and size of a symbol in octal instead of
		decimal.

With the 4.3BSD version you can do

	nm -o *.o | egrep _memcpy

and get the result you want.  For any version of "nm" that I know of, you can
do the "egrep" trick mentioned in another posting; you may have to use a flag
such as "-p" with the S5 version to get "easily parsable, terse output."

john@frog.UUCP (John Woods) (06/09/88)

Hypothesize for the moment that I would like to have the Subject: lines for
each article in /usr/spool/news/comp/sources/unix.  Many people have proposed
a new flag for the "new grep" (one that functions just like the -one flag does
on "match", the matching program I use (a flag I implemented long ago)).

In article<5007@sdcsvax.UCSD.EDU>,hutch@net1.ucsd.edu(Jim Hutchison) suggests:
> grep pattern file1 ... fileN | head -1
> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.  This would result in relatively little
> of the file actually being read under most Unix implementations.

Yes, it would result in relatively little of the file being read.  It would
also result in relatively little of the desired output.  Check the problem
space before posting solutions, folks.

As I pointed out in another message, you can get awk to solve the problem
almost exactly, with some irregularity in the NFILES={0,1} cases.  However,
the "tool-using" approach is a two-edged sword, it seems to me:  a matching
problem should be solvable by using the matching tool, not by a special case
of an editor tool (the purported "sed" solution) or by having to reach for
a full-blown programming language (awk); just as one should not paginate
a text file by using the /PAGINATE /NOPRINT features of a line-printer
program...  Sometimes you need to EN-feature a program in order to avoid
having to turn to (other) inappropriate tools.  "Oh, you can't ADD text
with this editor, only change existing text.  You add text by using
'cat >> filename' ..."

I like the "context" tool suggested elsewhere, but it has one problem (as
stated) for replacing context diffs:  context diffs are both context and
_differences_, and are generally clearly marked as such (i.e., the !+-
convention); while I guess you could turn an ed-script style diff listing
into a context diff (given both input files and the diff marks), that is
a radically different input language than that proposed for eliminating
context grep.  This just means, however, that two context tools are needed,
not just one.

To paraphrase Einstein, "Programs should be as simple as possible, and no
simpler."
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

john@frog.UUCP (John Woods) (06/09/88)

In article <1998@u1100a.UUCP>, krohn@u1100a.UUCP (Eric Krohn) writes:
> In article <1112@X.UUCP> john@frog.UUCP (some clown :-) writes:
> ] 	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *
> 
> This will print Subject: lines more than once per file if a file happens to
> have more than one Subject: line.  `Next' goes to the next input line, not
> the next input file, so you are still left with an exhaustive search of all
> the files.
> 
Oops.  I blew it.  Working on GNU awk seems to have permanently damaged my
brain (there are a couple of differences between "real" awk and GNU awk which
I couldn't convince the author were worth changing, specifically in 'exit'
(not next); GNU exit actually does what I thought next would do, instead of
exiting entirely).  

-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

maujd@warwick.UUCP (Geoff Rimmer) (06/10/88)

In article <2450011@hpsal2.HP.COM> morrell@hpsal2.HP.COM (Michael Morrell) writes:
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

How about

	grep -n pattern file | sed "s/:.*//"

?
	------------------------------------------------------------
	Geoff Rimmer, Computer Science, Warwick University, UK.
	maujd@uk.ac.warwick.opal
	------------------------------------------------------------

vanam@pttesac.UUCP (Marnix van Ammers) (06/10/88)

In article <4524@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes:

>There have been times when I wanted a grep that would print out the
>first occurrence and then stop.

sed -n -e "/<pattern>/ { p" -e q -e "}"

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>> the right thing to do is to write a context program that takes input
>> looking like "filename:linenumber:goo" and prints whatever context ...

> Heavens -- a tool user.  I thought that only Neanderthals were still
> alive.  I guess Bell Labs escaped the plague.

A real useful `tool', this, that works only on files.  And only when
you grep more than one file, so you get filenames (or happen to be able
to remember which flag it is to make grep print filenames always,
assuming of course that your grep has it).

Besides, grep has the context, or could have if it wanted to bother
saving it.  Why read all two hundred thousand lines of the file
*again*?  Wasn't it bad enough the first time?

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <1030@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>> In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>> the right thing to do is to write a context program that takes
>>> input looking like "filename:linenumber:goo" and prints whatever
>>> context ...
>> Almost, unless the original input was produced by a pipeline, [...]
>> unless you tee the mess to a temp file, yup, mess is the right word.
> How about:
> alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

This assumes that (a) there's room on /tmp to save the whole thing and
(b) that you don't mind rereading it all to find the appropriate line.

Both assumptions are commonly violated, in my experience.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <8022@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Or, getting back to context-grep, what good would it do to show
> context from a pipe?  To do anything with the information (other than
> stare at it), you'd need to produce it again.

Why do we have diff -c?  Generally, to stare at.  (The only other use I
know of is producing diffs for Larry Wall's patch program.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <5007@sdcsvax.UCSD.EDU>, hutch@net1.ucsd.edu (Jim Hutchison) writes:
> 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>> In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[attribution(s) lost]
>>>> There have been times when I wanted a grep that would print out
>>>> the first occurrence and then stop.
>>> grep '(your_pattern_here)' | head -1
>> Have you ever waited for a computer?  There are times when I want
>> the first occurrence of a pattern without reading the [whole file].

> grep pattern file1 ... fileN | head -1

> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.

No.  It should not send the SIGPIPE until grep writes the second line.
And because grep is likely to use stdio for its output, nothing at all
may be written to the pipe until grep has 1K or 2K or whatever size its
stdio uses for the output buffer.  This may be an enormous waste of
time, both cpu and real.

Besides which, it's wrong.  It prints just the first match, whereas
what's wanted is the first match *from each file*.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <7207@watdragon.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:
> Scenario - you're trying to link 55 .o's together to build a program
> you don't know that well.  You're on berklix.  ld sez: "undefined:
> _memcpy".  You say: "who's doing that?".  The source is scattered
> inconveniently.  The obvious thing to do is:  grep -l _memcpy *.o

Doesn't anybody read the man pages any more?  The obvious thing is to
use the supplied facility: the -y option to ld.

% cc -o program *.o -y_memcpy
Undefined:
_memcpy
buildstruct.o: reference to external undefined _memcpy
copytree.o: reference to external undefined _memcpy
%

(I don't know how generally available this is.  You did say "berklix",
and I know this is in 4.3, but I don't know about other Berklices.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <1037@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>> From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>> alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>> I don't understand, the way to avoid having to tee it into temp
>> files is to tee it into temp files?
> No.  There is no way to avoid teeing it into a temp file.

Sure there is.

> If you want context then you need to save it.

True.  But you don't necessarily need to save it in a file.

> [the alias above is] the only way to save context in a single-stream
> pipe philosophy.

Grep can save it in memory.  Unless you want so much context that it
overflows the available memory, which I find difficult to see
happening, this is a perfectly good place to put it.

In fact, I wrote a grep variant which starts by snarfing the whole file
into (virtual) memory.  Makes for extreme speed when it's usable, which
is often enough to make it worthwhile (for me, at least).  And of
course it means that I could get as much context as I cared to.  (I've
never had it fail because it couldn't get enough memory to hold the
whole file.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

andrew@alice.UUCP (06/11/88)

	The following is a summary of the somewhat plausible ideas
suggested for the new grep. I thank leo de witt particularly and others
for clearing up misconceptions and pointing out (correctly) that
existing tools like sed already do (or at least nearly do) what some people
asked for. The following points are in no particular order and no slight is
intended by my presentation. After that, I summarise the current flags.

1) named character classes, e.g. \alpha, \digit.
	i think this is a hokey idea and dismissed it as unnecessary crud
	but then found out it is part of the proposed regular expression
	stuff for posix. it may creep in but i hope not.

2) matching multi-line patterns (\n as part of pattern)
	this actually requires a lot of infrastructure support and thought.
	i prefer to leave that to other more powerful programs such as sam.

3) print lines with context.
	the second most requested feature but i'm not doing it. this is
	just the job for sed. to be consistent, we just took the context
	crap out of diff too. this is actually reasonable; showing context
	is the job for a separate tool (pipeline difficulties apart).

4) print one(first matching) line and go onto the next file.
	most of the justification for this seemed to be scanning
	mail and/or netnews articles for the subject line; neither
	of which gets any sympathy from me. but it is easy to do
	and doesn't add an option; we add a new option (say -1)
	and remove -s. -1 is just like -s except it prints the matching line.
	then the old grep -s pattern is now grep -1 pattern > /dev/null
	and within epsilon of being as efficent.

5) divert matching lines onto one fd, nonmatching onto another.
	sorry, run grep twice.

6) print the Nth occurence of the pattern (N is number or list).
	it may be possible to think of a real reason for this (i couldn't)
	but the answer is no.

7) -w (pattern matches only words)
	the most requested feature. well, it turns out that -x (exact)
	is there because doug mcilroy wanted to match words against a dictionary.
	it seems to have no other use. Therefore, -x is being dropped
	(after all, it only costs a quick edit to do it yourself) and is
	replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]).

8) grep should work on binary files and kanji.
	that it should work on kanji or any character set is a given
	(at least, any character set supported by the system V international
	character set stuff). binary files will work too modulo the
	following restraint: lines (between \n's) have to fit in a
	buffer (current size 64K). violations are an error (exit 2).

9) -b has bogus units.
	agreed. -b now is in bytes.

10) -B (add an ^ to the front of the given pattern, analogous to -x and -w)
	-x (and -w) is enough. sorry.

11) recursively descend through argument lists
	no. find | xargs is going to have to do.

12) read filenames on standard input
	no. xargs will have to do.

13) should be as fast as bm.
	no worries. in fact, our egrep is 3xfaster than bm. i intend to be
	competitive with woods' egrep. it should also be as fast as fgrep for
	multiple keywords. the new grep incorporates boyer-moore
	as a degenerate case of Commentz-Walter, a faster replacement
	for the fgrep algorithm.

14) -lv (files that don't have any matching lines)
	-lv means print names of files that have any nonmatching lines
	(useful, say, for checking input syntax). -L will mean print
	names of files without selected lines.

15) print the part of the line that matched.
	no. that is available at the subroutine level.

16) compatability with old grep/fgrep/egrep.
	the current name for the new command is gre (aho chose it).
	after a while, it will become our grep. there will be a -G
	flag to take patterns a la old grep and a -F to take
	patterns a la fgrep (that is, no metacharacters except \n == |).
	gre is close enough to egrep to not matter.

17) fewer limits.
	so far, gre will have only one limit, a line length of 64K.
	(NO, i am not supporting arbitrary length lines (yet)!)
	we forsee no need for any other limit. for example, the
	current gre acts like fgrep. it is 4 times faster than
	fgrep and has no limits; we can gre -f /usr/dict/words
	(72K words, 600KB).

18) recognise file types (ignore binaries, unpack packed files etc).
	get real. go back to your macintosh or pyramid. gre will just grep
	files, not understand them.

19) handle patterns occurring multiple times per line
	this is illdefined (how many time does aaaa occur in a line of 20 'a's?
	in order of decreasing correctness, the answers are >=1, 17, 5).
	For the cases people mentioned (words), pipe it thru
	tr to put the words one per line.

20) why use \{\} instead of \(\)?
	this is not yet resolved (mcilroy&ritchie vs aho&pike&me).
	grouping is an orthogonal issue to subexpressions so why
	use the same parentheses? the latest suggestion (by ritchie)
	is to allow both \(\) and \{\} as grouping operators but
	the \3 would only count one type (say \(\)). this would be much
	better for complicated patterns with much grouping.

21) subroutine versions of the pattern matching stuff.
	in a deep sense, the new grep will have no pattern matching code in it.
	all the pattern matching code will be in libc with a uniform
	interface. the boyer-moore and commentz-walter routines have been
	done. the other two are egrep and back-referencing egrep.
	lastly, regexp will be reimplemented.

22) support a filename of - to mean standard input.
	a unix without /dev/stdin is largely bogus but as a sop to the poor
	barstards having to work on BSD, gre will support -
	as stdin (at least for a while).

Thus, the current proposal is the following flags. it would take a GOOD
argument to change my mind on this list (unless it is to get rid of a flag).

-f file	pattern is (`cat file`)
-v	nonmatching lines are 'selected'
-i	ignore aphabetic case
-n	print line number
-c	print count of selected lines only
-l	print filenames which have a selected line
-L	print filenames who do not have a selected line
-b	print byte offset of line begin
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-w	pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9])
-1	print only first selected line per file
-e expr	use expr as the pattern

Andrew Hume
research!andrew

wswietse@eutrc3.UUCP (Wietse Venema) (06/11/88)

In article <7207@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
}Grep should, where reasonable, not be bound by the notion of a 'line'.
}As a concrete expression of this, the useful grep -l (prints the names of
}the files that contain the string) should work on any kind of file.  More
}than one existing 'grep -l' will fail, for example, to tell you which of a 
}bunch of .o files contain a given string.  Scenario - you're trying to
}link 55 .o's together to build a program you don't know that well.  You're
}on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
}The source is scattered inconveniently.  The obvious thing to do is: 
}grep -l _memcpy *.o
}That this often will not work is irritating.
}Tim Bray, New Oxford English Dictionary Project, U of Waterloo

	nm -op *.o | grep memcpy

will work just fine, both with bsd and att unix.

	Wietse
-- 
uucp:	mcvax!eutrc3!wswietse	| Eindhoven University of Technology
bitnet:	wswietse@heithe5	| Dept. of Mathematics and Computer Science
surf:	tuerc5::wswietse	| Eindhoven, The Netherlands.

randy@umn-cs.cs.umn.edu (Randy Orrison) (06/12/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
|3) print lines with context.
|	the second most requested feature but i'm not doing it. this is
|	just the job for sed. to be consistent, we just took the context
							^^^^^^^^^^^^^^^^
|	crap out of diff too. this is actually reasonable; showing context
	^^^^^^^^^^^^^^^^
|	is the job for a separate tool (pipeline difficulties apart).


What?!?!?   Ok, i would like context in grep, but i'll live without it.
Context diffs, however are a different matter.  There isn't an easy way
to generate them with diff/context (the first character of every line is
produced as part of the diff).  Context diffs are useful for patches, and
having a tool to generate them is necessary.  They're a logical improvement
to diff that is more than just context around the changes.

If you're fixing grep fine, but don't break diff while you're at it.

	-randy
-- 
Randy Orrison, Control Data, Arden Hills, MN		randy@ux.acss.umn.edu
8-(OSF/Mumblix: Just say NO!)-8	    {ihnp4, seismo!rutgers, sun}!umn-cs!randy
	"I consulted all the sages I could find in Yellow Pages,
	but there aren't many of them."			-APP

allbery@ncoast.UUCP (Brandon S. Allbery) (06/13/88)

As quoted from <7944@alice.UUCP> by andrew@alice.UUCP:
+---------------
| 	the right thing to do is to write a context program that takes
| input looking like "filename:linenumber:goo" and prints whatever context you like.
| we can then take this crap out of grep and diff and make it generally available
| for use with programs like the C compiler and eqn and so on. It can also do
| the right thing with folding together nearby lines. At least one good first
| cut has been put on the net but a C program sounds easy enough to do.
+---------------

A C version has been done; it handles pcc, grep -n, and cpp messages.  I
posted it 2 1/2 years ago.

It does *not* handle diff, since diff's messages are slightly different and
lack filename information; also, since it passes lines it doesn't understand
you'd end up with both regular and context diffs in the same output.  Now if
diff had an option to output in the format

		<filename>:<lineno>[-<lineno>]:<action>

we'd be all set -- I could modify it to handle ranges easily.  (Changes
would be output as "file1:n-m:file was\nfile2:n-m:now is", or something
similar.)

Note that it'd be nice if lint output messages this way as well.  I have a
postprocessor for lint which does this -- even with System V's lint that
can have lint1 and lint2 run separately via .ln files.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

keith@seismo.CSS.GOV (Keith Bostic) (06/14/88)

In article <7962@alice.UUCP>, andrew@alice.UUCP writes:

> 22) support a filename of - to mean standard input.
> 	a unix without /dev/stdin is largely bogus but as a sop to the poor
> 	barstards having to work on BSD, gre will support -
> 	as stdin (at least for a while).
>
> Andrew Hume
> research!andrew

A few comments:

     -- As far I'm aware, V9 is the only system that has "/dev/stdin" at the
	moment.  For those who haven't heard of it, V9 is a research version
	of UN*X developed and in use at the Computing Science Research Center,
	a part of AT&T Bell Laboratories, and available to a small number of
	universities.  It was preceded by V8, which, interestingly enough, was
	built on top of 4.1BSD.

     -- System V does not suppport "/dev/stdin".

     -- The next full release of BSD will contain "/dev/stdin" and friends.
	It is not part of the 4.3-tahoe release because it requires changes
	to stdio.  I do not expect, however, commands that currently support
	the "-" syntax to change, for compatibility reasons.  V9 itself
	continues to support such commands.

To sum up, let's try and keep this, if not actually constructive, at least
bearing some distant relationship to the facts.

Keith Bostic

allbery@ncoast.UUCP (Brandon S. Allbery) (06/14/88)

As quoted from <5007@sdcsvax.UCSD.EDU> by hutch@net1.ucsd.edu (Jim Hutchison):
+---------------
| 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
| >In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
| >|> There have been times when I wanted a grep that would print out the
| >|> first occurrence and then stop.
| >|
| >|grep '(your_pattern_here)' | head -1
| >
| >There are times when I want the first occurrence of a pattern without
| >reading the entire (i.e. HUGE) file.
| 
| I realize this is dependent on the way in which processes sharing a
| pipe act, but this is a point worth considering before we get yet
| another annoying burst of "cat -v" type programs.
| 
| grep pattern file1 ... fileN | head -1
| 
| This should send grep a SIGPIPE as soon as the first line of output
| trickles through the pipe.  This would result in relatively little
| of the file actually being read under most Unix implementations.
+---------------

Not true.  The SIGPIPE is sent when "grep" writes the second line, *not*
when "head" exits!  If there *is* only one line containing the pattern, grep
will happily read all of the (possibly large) files without getting SIGPIPE.
This is not pleasant, even if it's only one large file -- say a
comp.sources.unix posting which you're grepping for a Subject: line.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

andrew@frip.gwd.tek.com (Andrew Klossner) (06/14/88)

[]

	"so far, gre will have only one limit, a line length of 64K.
	(NO, i am not supporting arbitrary length lines (yet)!)"

Why not a flag to let the user specify the max line length?  Just the
thing for that database hacker, and diminishes the demand for arbitrary
length.

	"there will be a -G flag to take patterns a la old grep and a
	-F to take patterns a la fgrep"

I hope that -F is a permanent, not temporary, flag.  I don't see it in
the summary list of supported flags, shudder.

	"a unix without /dev/stdin is largely bogus but as a sop to the
	poor barstards having to work on BSD, gre will support - as
	stdin (at least for a while)."

It's not just BSD; I haven't seen /dev/stdin in any released edition.
I just looked over the sVr3.1 tape and didn't turn up anything.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

chris@mimsy.UUCP (Chris Torek) (06/14/88)

In article <44370@beno.seismo.CSS.GOV> keith@seismo.CSS.GOV
[at seismo?!?] (Keith Bostic) writes:
>    -- The next full release of BSD will contain "/dev/stdin" and friends.
>	It is not part of the 4.3-tahoe release because it requires changes
>	to stdio.

Well, only because

	freopen("/dev/stdin", "r", stdin)

unexpectedly fails: it closes fd 0 before attempting to open /dev/stdin,
which means that stdin is gone before it can grab it again.  When I
`fixed' this here it broke /usr/ucb/head and I had to fix the fix!

The sequence needed is messy:

	old = fileno(fp);
	new = open(...);
	if (new < 0) {
		close(old);	/* maybe it was EMFILE */
		new = open(...);/* (could test errno too) */
		if (new < 0)
			return error;
	}
	if (new != old) {
		if (dup2(new, old) >= 0)	/* move it back */
			close(new);
		else {
			close(old);
			fileno(fp) = new;
		}
	}

Not using dup2 means that freopen(stderr) might make fileno(stderr)
something other than 2, which breaks at least perror().
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

tbray@watsol.waterloo.edu (Tim Bray) (06/14/88)

>In article <7207@watdragon.waterloo.edu> I wrote:
>}Grep should, where reasonable, not be bound by the notion of a 'line'.
...
>}The source is scattered inconveniently.  The obvious thing to do is: 
>}grep -l _memcpy *.o
>}That this often will not work is irritating.

At least a dozen people have sent me alternate ways of doing this, the 
most obvious using 'nm'.  Look, I KNOW ABOUT NM! But you're missing the 
point - suppose the item in the .o files was another type of string, e.g.
an error message.  

The point is:  There are some files.  One or more may contain a string in
which I am interested.  grep -l is a tool which is supposed to tell me whether
one or more files contain a string.  The fact that it refuses to do so for 
a class of magic files is a gratuitous violation of the unix paradigm.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

oz@yunexus.UUCP (Ozan Yigit) (06/15/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
>
>21) subroutine versions of the pattern matching stuff.
>	....
>	.... the other two are egrep and back-referencing egrep.
>	lastly, regexp will be reimplemented.
>
>Andrew Hume

Just how do you propose to implement the back-referencing trick in 
a properly constructed (nfa and/or nfa->dfa conversion static or
on-the-fly) egrep ?? I presume that after each match of the
\(reference\) portion, you would have to on-the-fly modify the \n
portion of the fsa. Gack! Do you have a theoretically solid algorithm
[say, within the context of Aho/Sethi/Ullman's Dragon Book chapter on
regular expressions] for this ??  I would be much interested.

oz
-- 
The DeathStar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense SUNbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

tj@mks.UUCP (T. J. Thompson) (06/15/88)

In article <8032@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> ... but I sure would like for the -H (ALWAYS print filename)
> option to be the default instead of the current variable algorithm.

This option is exactly what you need when exec'ing grep from find.
It is implemented as
	grep pattern file /dev/null

-- 
     ll  // // ,'/~~\'   T. J. Thompson              uunet!watmath!mks!tj
    /ll/// //l' `\\\     Mortice Kern Systems Inc.         (519) 884-2251
   / l //_// ll\___/     35 King St. N., Waterloo, Ont., Can. N2J 2W9
O_/                                long time(); /* know C */

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (06/15/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
|
|	The following is a summary of the somewhat plausible ideas
|suggested for the new grep. 

|4) print one(first matching) line and go onto the next file.
|	most of the justification for this seemed to be scanning
|	mail and/or netnews articles for the subject line; neither
|	of which gets any sympathy from me. but it is easy to do
|	and doesn't add an option; we add a new option (say -1)
|	and remove -s. -1 is just like -s except it prints the matching line.
|	then the old grep -s pattern is now grep -1 pattern > /dev/null
|	and within epsilon of being as efficent.
	                            -----------
Actually this is extremely wrong.

Given the command 
	grep -1 Subject /usr/spool/news/comp/sources/unix/* >/dev/null
and
	grep -s Subject /usr/spool/news/comp/sources/unix/* >/dev/null

I would expect the first one to read *every* file. 

The second case ( -s ) should terminate as soon as it finds the first
match in the first file.

Unless I misunderstand the functionality of the -s command.
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

guy@gorodish.Sun.COM (Guy Harris) (06/16/88)

> grep -l is a tool which is supposed to tell me whether one or more files
> contain a string.

No, it isn't.  "grep -l" is a tool that is supposed to tell you whether one or
more *text* files contain a string; if your file doesn't happen to contain
newlines at least every N characters or so, too bad.  If you want to improve
this situation by writing a "grep" that doesn't have this restriction, feel
free.

> The fact that it refuses to do so for a class of magic files is a
> gratuitous violation of the unix paradigm.

"ed is a tool that is supposed to let me modify files.  The fact that it
refuses to do so for a class of magic files is a gratuitous violation of the
unix paradigm."  Sorry, but the fact that you can't normally use "ed" to patch
binaries doesn't bother me one bit.

ljz@fxgrp.UUCP (Lloyd Zusman) (06/16/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
  
  	The following is a summary of the somewhat plausible ideas
  suggested for the new grep.  ...

  ...

  2) matching multi-line patterns (\n as part of pattern)
  	this actually requires a lot of infrastructure support and thought.
  	i prefer to leave that to other more powerful programs such as sam.
                                                                       ^^^
  ...

Since I'm one of the people who suggested the ability to match multi-line
patterns, I'm a bit disappointed about this ... but such is life.  So
where can I find 'sam'?  Is it in the public domain?  Is source code
available?

You can try to reply via email ... it might actually work, but don't
be surprised if your mail bounces, in which case I'd appreciate
replies here.

Thanks in advance.

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/16/88)

In article <698@fxgrp.UUCP> ljz%fx.com@ames.arc.nasa.gov (Lloyd Zusman) writes:
>where can I find 'sam'?  Is it in the public domain?  Is source code
>available?

So far as I know, if you aren't part of AT&T and don't have 9th Edition UNIX,
the only way to legally obtain "sam" is to acquire it from the AT&T UNIX
System ToolChest, where it is included in the "dmd-pgmg" package.  This is
definitely not public domain, but it's inexpensively priced and it does
include source code.

"sam" works either with dumb terminals or with a smart one like an AT&T
Teletype 5620 or 630.  I haven't tried installing it without DMD support
but obviously it can be done.

I use "sam" (DMD version) whenever I have serious editing to do.

fmr@cwi.nl (Frank Rahmani) (06/16/88)

> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792
> Posted: Fri Jun 10 05:29:43 1988
> 
> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> A real useful `tool', this, that works only on files.  And only when
> you grep more than one file, so you get filenames (or happen to be able
> to remember which flag it is to make grep print filenames always,
> assuming of course that your grep has it).
...
...
that's the smallest of all problems, just include /dev/null as first
file to be searched
into your script like
grep [options] pattern /dev/null one_or_more_filenames
by the way I like the sed one-liner that
was posted as answer to the grep replacement
question. Why couldn't I think of it?:-)
fmr@cwi.nl
-- 
It is better never to have been born. But who among us has such luck?
--------------------------------------------------------------------------
These opinions are solely mine and in no way reflect those of my employer.  

daveb@geac.UUCP (David Collier-Brown) (06/17/88)

In article <10078@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew
Klossner) quotes someone to say:
>[]
>
>	"so far, gre will have only one limit, a line length of 64K.
>	(NO, i am not supporting arbitrary length lines (yet)!)"

   Well, arbitrary line lengths are easy.

  Initially
	allocate a cache
  When reading
	fgets a cache-full
	if the last character is not a \n
		increase the cache with realloc
		read some more


  A function to do this, called getline, was published recently in
the source groups.

--dave (remember my old .signature?) c-b
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  | "His Majesty made you a major 
 350 Steelcase Road,   |  because he believed you would 
 Markham, Ontario.     |  know when not to obey his orders"

wolfe@pdnbah.uucp (Mike Wolfe) (06/17/88)

In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes:
>> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792
>> Posted: Fri Jun 10 05:29:43 1988
>> 
>> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>> A real useful `tool', this, that works only on files.  And only when
>> you grep more than one file, so you get filenames (or happen to be able
>> to remember which flag it is to make grep print filenames always,
>> assuming of course that your grep has it).
>...
>...
>that's the smallest of all problems, just include /dev/null as first
>file to be searched
>into your script like
>grep [options] pattern /dev/null one_or_more_filenames

Smallest of all problems? One of my pet peeves is the fact that certain
commands will only print filenames if you give it more than one file. While
the /dev/null ugliness is a suitable kludge for the grep case what about
a case were you want to run something using xargs, something like sum. You
don't want /dev/null repeated for each call. I know I can sed it out but
that's just a kludge for a kludge and to me that's a red flag.

I think that all commands of that type should allow you to force the filenames
in output. I don't want to go back and change all the commands (UNIX++ a
modest proposal ;-). I just wish people would keep this in mind when writing
things in the future.

----
Mike Wolfe
Paradyne Corporation,  Mail stop LF-207   DOMAIN   wolfe@pdn.UUCP
PO Box 2826, 8550 Ulmerton Road           UUCP     ...!uunet!pdn!wolfe
Largo, FL  34649-2826                     PHONE    (813) 530-8361

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/18/88)

In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes:
>> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:

But I didn't.  (I think it was BZS.)  PLEASE, check your attributions!

maart@cs.vu.nl (Maarten Litmaath) (06/18/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
\...
\5) divert matching lines onto one fd, nonmatching onto another.
\	sorry, run grep twice.

Come on! The diversion is no problem at all to implement, and it can be very
useful (you cannot run grep twice on stdin, without use of temporary files).
Regards.
-- 
South-Africa:                         |Maarten Litmaath @ Free U Amsterdam:
           revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart

allbery@ncoast.UUCP (Brandon S. Allbery) (06/19/88)

As quoted from <5826@umn-cs.cs.umn.edu> by randy@umn-cs.cs.umn.edu (Randy Orrison):
+---------------
| In article <7962@alice.UUCP> andrew@alice.UUCP writes:
| |3) print lines with context.
| |	the second most requested feature but i'm not doing it. this is
| |	just the job for sed. to be consistent, we just took the context
| 							^^^^^^^^^^^^^^^^
| |	crap out of diff too. this is actually reasonable; showing context
| 	^^^^^^^^^^^^^^^^
| |	is the job for a separate tool (pipeline difficulties apart).
| 
| 
| What?!?!?   Ok, i would like context in grep, but i'll live without it.
| Context diffs, however are a different matter.  There isn't an easy way
| to generate them with diff/context (the first character of every line is
| produced as part of the diff).  Context diffs are useful for patches, and
+---------------

Yes, there is; change diff's output format slightly and expand "context"
slightly, then other programs can also output in "extended context" format so
as to use "context"'s facilities.  I've already described part of this
change in another posting; the other part would be to recognize a special
indicator (on the line number, perhaps?) which would for generality be the
flag to use on the difference, defaulting to "*" which is what "context"
currently uses, or diff could specify "+", "-", or "!".  The only other
change would be to smarten "context" so that it "collapses" context
"windows" together much like the 4.3BSD diff -c does.

It appears that Bell Labs continues to use tools unrepentantly.  It should
be noted that they *are* into research, so I have no arguments against their
use of /dev/stdin (/dev/fd/0?), their assumption that there's plenty of
space so stash away a copy of a file with "tee" for later use in "context",
etc.  (My /dev/stdin complaint earlier was not aimed at the Bell Labs folks,
it was aimed at the person who informed the entire Usenet that "hey, I
posted a /dev/stdin driver source for 4.2BSD, so not a one of you has any
reason not to be running it".  In other words, the usual 4.xBSD-source
elitism.)
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

frei@rubmez.UUCP (Matthias Frei ) (06/20/88)

In article <7962@alice.UUCP>, andrew@alice.UUCP writes:
> 
> 	The following is a summary of the somewhat plausible ideas

You are backbiting nearly all of the good ideas, posted by
many users at the Net.
So, why did you post your questionable request, if you only
want to do some minor changes to grep ???
Please don't waste our time with things like that.

    Matthias Frei
--------------------------------------------------------------------
Snail-mail:                    |  E-Mail address:
Microelectronics Center        |                 UUCP  frei@rubmez.uucp        
University of Bochum           |                (...uunet!unido!rubmez!frei)
4630 Bochum 1, P.O.-Box 102143 |
West Germany                   |

greywolf@unicom.UUCP (greywolf) (06/25/88)

In article <1304@ark.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
# In article <7962@alice.UUCP> andrew@alice.UUCP writes:
# \...
# \5) divert matching lines onto one fd, nonmatching onto another.
# \	sorry, run grep twice.
# 
# Come on! The diversion is no problem at all to implement, and it can be very
# useful (you cannot run grep twice on stdin, without use of temporary files).
# Regards.

Essentially, I think with respect to the tool -flag concept, their
attitude there is "See figure 1."  This is ESPECIALLY true when they have
the opportunity to say "NIH"! (Sounds like the knights from Monty Python:
The Search for the Holy Grail).
	For those of you who do not understand "See figure 1.", I am sure
that there are some people inside AT&T who would be happy to tell you.
They tell me every month on my phone bill.

# -- 
# South-Africa:                         |Maarten Litmaath @ Free U Amsterdam:
#          revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart
--