[comp.unix.wizards] grep replacement

andrew@alice.UUCP (05/23/88)

	Al Aho and I are designing a replacement for grep, egrep and fgrep.
The question is what flags should it support and what kind of patterns
should it handle? (Assume the existence of flags to make it compatible
with grep, egrep and fgrep.)
	The proposed flags are the V9 flags:
-f file	pattern is (`cat file`)
-v	print nonmatching
-i	ignore aphabetic case
-n	print line number
-x	the pattern used is ^pattern$
-c	print count only
-l	print filenames only
-b	print block numbers
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-s	no output; just status
-e expr	use expr as the pattern

The patterns are as for egrep, supplemented by back-referencing
as in \{pattern\}\1.

please send your comments about flags or patterns to research!andrew

papowell@attila.uucp (Patrick Powell) (05/25/88)

In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>
>	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>The question is what flags should it support and what kind of patterns
>should it handle? (Assume the existence of flags to make it compatible
>with grep, egrep and fgrep.)
>
>please send your comments about flags or patterns to research!andrew

The one thing I miss about grep families is the ability to have
a named search pattern. For example:

DIGIT= \{[0-9]\}
ALPHA=\{[a-zA-Z]\}
\${ALPHA}\${PATTERN}

This would sort of make sense.

The other facility is to find multiple line patterns, as in:
find the pair of lines that have pattern1 in the first line
pattern2 in the second, etc.

This I have needed sooo many times;  I have ended up using AWK
and a clumsy set of searches.

For example:
\#{1 p}Pattern
\#{2}Pattern
This could print out lines that match,  or only the first line
(1p->print this one only).

Patrick Powell
Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE,
University of Minnesota,  Minneapolis, MN 55455 (612)625-3543/625-4002

ljz@fxgrp.UUCP (Lloyd Zusman) (05/26/88)

In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
  In article <7882@alice.UUCP> andrew@alice.UUCP writes:
  >
  >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
  >The question is what flags should it support and what kind of patterns
  >should it handle? ...

  ...

  The other facility is to find multiple line patterns, as in:
  find the pair of lines that have pattern1 in the first line
  pattern2 in the second, etc.
  
  This I have needed sooo many times;  I have ended up using AWK
  and a clumsy set of searches.
  
  For example:
  \#{1 p}Pattern
  \#{2}Pattern
  This could print out lines that match,  or only the first line
  (1p->print this one only).

  ...

Or another way to get this functionality would be for this new greplike
thing to allow matches on the newline character.  For example:

    ^.*foo\nbar.*$
          ^^
    	newline

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) (05/26/88)

One thing I would _love_ is to be able to find the context of what I've
found, for example, to find the two (n?) surrounding lines.  I have wanted
to do this many times and there is no good way.

	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
			elroy!alan@csvax.caltech.edu	   could go wrong?"

ben@idsnh.UUCP (Ben Smith) (05/26/88)

I also would like to see more of the lex capabilities in grep.
-- 
Integrated Decision Systems, Inc.    | Benjamin Smith - East Coast Tech. Office
The fitting solution in professional | Peterborough, NH
portfolio management software.       | UUCP: uunet!idsnh!ben

kutz@bgsuvax.UUCP (Kenneth Kutz) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
  
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
  
There is a program on the Usenix tape under .../Utilities/Telephone
called 'tele'.  If you call the program using the name 'g', it
supports displaying of context.  E-mail me if you want more info.



-- 
--------------------------------------------------------------------
      Kenneth J. Kutz         	CSNET kutz@bgsu.edu
				UUCP  ...!osu-cis!bgsuvax!kutz
 Disclaimer: Opinions expressed are my own and not of my employer's
--------------------------------------------------------------------

dcon@ihlpe.ATT.COM (452is-Connet) (05/26/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

Also, what line number it was found on.

David Connet
ihnp4!ihlpe!dcon

david@elroy.Jpl.Nasa.Gov (David Robinson) (05/27/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:

> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
 
> Also, what line number it was found on.
 


How about "grep -n"?



-- 
	David Robinson		elroy!david@csvax.caltech.edu     ARPA
				david@elroy.jpl.nasa.gov	  ARPA
				{cit-vax,ames}!elroy!david	  UUCP
Disclaimer: No one listens to me anyway!

daveb@laidbak.UUCP (Dave Burton) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
|Also, what line number it was found on.

Already there: grep -n.

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

Please. Maybe "grep -k" where k is any integer giving the number of lines
of context on each side of grep, default is 0. Oh, but hey, _you're_ designing
it! :-)
-- 
--------------------"Well, it looked good when I wrote it"---------------------
 Verbal: Dave Burton                        Net: ...!ihnp4!laidbak!daveb
 V-MAIL: (312) 505-9100 x325            USSnail: 1901 N. Naper Blvd.
#include <disclaimer.h>                          Naperville, IL  60540

dcon@ihlpe.ATT.COM (452is-Connet) (05/27/88)

In article <6877@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes:
>In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
>> Also, what line number it was found on.
>How about "grep -n"?
>


Embarassed and red-faced he goes away to read the man-page...

stan@sdba.UUCP (Stan Brown) (05/27/88)

> 
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.
> 
> 	-- Alan		..!cit-vax!elroy!alan		* "But seriously, what
> 			elroy!alan@csvax.caltech.edu	   could go wrong?"


	Along this same general line it would be nice to be abble to
	look for paterns that span lines.  But perhaps this would be
	tom complete a change in the philosophy of grep ?

	stan


-- 
Stan Brown	S. D. Brown & Associates	404-292-9497
(uunet gatech)!sdba!stan				"vi forever"

jas@rain.rtech.UUCP (Jim Shankland) (05/27/88)

In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes:
>In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines....
>
>Also, what line number it was found on.

You've already got the line number with the "-n" option.  Note that that makes
it easy to write a little wrapper script that gives you context grep.
Whether that's preferable to adding the context option to grep is, I suppose,
debatable; but I can already see the USENIX paper:

	"newgrep -[whatever] Considered Harmful"

Jim Shankland
  ..!ihnp4!cpsc6a!\
               sun!rtech!jas
 ..!ucbvax!mtxinu!/

aperez@cvbnet2.UUCP (Arturo Perez Ext.) (05/27/88)

From article <662@fxgrp.UUCP>, by ljz@fxgrp.UUCP (Lloyd Zusman):
> In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes:
>   In article <7882@alice.UUCP> andrew@alice.UUCP writes:
>   >
>   >	Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   >The question is what flags should it support and what kind of patterns
>   >should it handle? ...

Actually, I agree with the guy who posted a request shortly before this
came out.

The most useful feature that is currently lacking is the ability to
do context greps, i.e. greps with a window.  There are two ways this could be
handled.   One is to allow awk-like constructs specifying beginning and 
ending points for a window.  Sort of like, e.g.

	grep -w '/:/,/^$/' file

which would find the lines between each pair of a ':' containing line and
the next following blank line.  The other way would be to have a simple
"number of lines around match" parameter, possibly with collapse of overlapping
windows.  Then you could say

	grep -w 5 foo file

which would print 2 lines above and below the matching line.  Either way
it's done would be nice.  I have made one attempt to implement this
with a script and it wasn't too much fun...

Arturo Perez
ComputerVision, a division of Prime

bzs@bu-cs.BU.EDU (Barry Shein) (05/28/88)

Re: grep with N context lines shown...

Interesting, that's very close to a concept of a multi-line record
grep where I treat N lines as one and any occurrance results in a
listing. The difference is the line chosen to count from (in a context
the match would probably be middle and +-N, in a record you'd just
list the record.)

Just wondering if a generalization is being missed here somewhere,
also consider grepping something like a termcap file, maybe what I
really want is a generalized method to supply pattern matchers for
what to list on a hit:

	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
					# beginning with white space to
					# one past current line

Of course, that destroys the stream nature of grep, it has to be able
to arbitrarily back up, ugh, although "last candidate for a start"
could be saved on the fly. The nice thing is that it can use
(essentially) the same pattern machinery for choosing printing (I
know, have to add in the notion of dot etc.)

I dunno, food for thought, like I said, maybe there's a generalization
here somewhere. Or maybe grep should just emit line numbers in a form
which could be post-processed by sed for fancier output (grep in
backquotes on sed line.) Therefore none of this is necessary :-)

	-Barry Shein, Boston University

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/28/88)

[mail bounced]

There have been times when I wanted a grep that would print out the
first occurrence and then stop.

-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

rbj@icst-cmr.arpa (Root Boy Jim) (05/28/88)

	   Al Aho and I are designing a replacement for grep, egrep and fgrep.
   The question is what flags should it support and what kind of patterns

I have always thought it would be nice to print only the first match.

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

wyatt@cfa.harvard.EDU (Bill Wyatt) (05/28/88)

> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

grep '(your_pattern_here)' | head -1
-- 

Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
Wyatt   ARPA:  wyatt@cfa.harvard.edu
         (or)  wyatt%cfa@harvard.harvard.edu
      BITNET:  wyatt@cfa2
        SPAN:  cfairt::wyatt 

dieter@nmtsun.nmt.edu (Dieter Muller) (05/28/88)

In article <22969@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>
 [introductory comments deleted]
>Just wondering if a generalization is being missed here somewhere,
>also consider grepping something like a termcap file, maybe what I
>really want is a generalized method to supply pattern matchers for
>what to list on a hit:
>
>	grep -P .+3,.-3 pattern		# print +-3 lines centered on match
>	grep -P ?^[^ \t]?,.+1 pattern	# print from previous line not
>					# beginning with white space to
>					# one past current line
>
 [various drawbacks deleted]
>	-Barry Shein, oston University

Many's the time I would have been willing to make a blood sacrifice for
this kind of capability.  Firing up emacs for /etc/termcap can be a real
pain, when you're A) on a low-speed terminal line (300/1200 baud), B)
looking for something near the end of the file, and C) many things between
the beginning of the file and what you want will match.  Even using
gnumacs in batch mode & writing some lisp to do it strikes me as inelegant.
Starting a 32K search program doesn't hurt nearly as much as starting up
a 1253K search program.

Also, I always use egrep instead of grep, since it is almost always faster.
I don't understand how it is also faster than fgrep, but that's what "time"
says.  Please consider this when picking algorithms.

Dieter (Gnumacs is nice, egrep is better) Muller
-- 
You want coherency, cogency, and literacy all in one posting?  Be real.
...{cmcl2, ihnp4}!lanl!unm-la!unmvax!nmtsun!dieter
dieter@nmtsun.nmt.edu

ado@elsie.UUCP (Arthur David Olson) (05/28/88)

> > There have been times when I wanted a grep that would print out the
> > first occurrence and then stop.
> 
> grep '(your_pattern_here)' | head -1

Doesn't cut it for

	grep '(your_pattern_here)' firstfile secondfile thirdfile ...
-- 
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

roy@phri.UUCP (Roy Smith) (05/28/88)

wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[as a way to get just the first occurance of pattern]
> grep '(your_pattern_here)' | head -1

	Yes, it'll certainly work, but I think it bypasses the original
intention; to save CPU time.  If I had a 1000 line file with pattern on
line 7, I want grep to read the first 7 lines, print out line 7, and exit.
grep|head, on the other hand, will read and search all 1000 lines of the
file; it won't exit (with a EPIPE) until it writes another line to stdout
and finds that head has already exited.  In fact, if grep block-buffers its
output, it may never do more than a single write(2) and never notice that
head has exited.

	Anyway, I agree with the "find first match" flag being a good idea.
It would certainly speed up things like

	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

where I know that the pattern is going to be matched in the first few lines
and don't want to bother searching the rest of the multi-killoline file.
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

chip@vector.UUCP (Chip Rosenthal) (05/29/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> grep '(your_pattern_here)' | head -1
>Doesn't cut it for
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

nor if you want to see if a match was found by testing the exit status
-- 
Chip Rosenthal /// chip@vector.UUCP /// Dallas Semiconductor /// 214-450-0400
{uunet!warble,sun!texsun!rpp386,killer}!vector!chip
I won't sing for politicians.  Ain't singing for Spuds.  This note's for you.

russ@groucho.ucar.edu (Russ Rew) (05/30/88)

I also recently had a need for printing multi-line "records" in which a
specified pattern appeared somewhere in the record.  The following
short csh script uses the awk capability to treat whole lines as fields
and empty lines as record separators to print all the records from
standard input that contain a line matching a regular specified as an
argument:

#!/bin/csh -f
awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '


     Russ Rew * UCAR (University Corp. for Atmospheric Research)
	 PO Box 3000 * Boulder, CO  80307-3000 * 303-497-8845
	     russ@unidata.ucar.edu * ...!hao!unidata!russ

rcodi@yabbie.rmit.oz (Ian Donaldson) (05/30/88)

From article <3324@phri.UUCP>, by roy@phri.UUCP (Roy Smith):
> It would certainly speed up things like
> 
> 	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

> where I know that the pattern is going to be matched in the first few lines
> and don't want to bother searching the rest of the multi-killoline file.

A simple permutation:

 	head -60 /usr/spool/news/comp/sources/unix/* | grep "^Subject: "

works fairly close to the mark, and doesn't waste much time at all.

Ian D

frei@rubmez.UUCP (Matthias Frei ) (05/30/88)

> 	Al Aho and I are designing a replacement for grep, egrep and fgrep.
> The question is what flags should it support and what kind of patterns
> should it handle? (Assume the existence of flags to make it compatible
> with grep, egrep and fgrep.)

Hi,
some applications need to divert a file in two parts.
One should contain all lines matching any patterns, the other
one all lines not matching any of the patterns.
So I want following flags:

	- d	divert the file
		"matches" to stdout
		"nomatches" to stderr
	-r	exchange stdout and stderr, if -d is given  

Will you post Your new grep to the net ? (I hope so)

Thanks in Advance for a nice new tool

	Matthias Frei

--------------------------------------------------------------------
Snail-mail:                    |  E-Mail address:
Microelectronics Center        |                 UUCP  frei@rubmez.uucp        
University of Bochum           |                (...uunet!unido!rubmez!frei)
4630 Bochum 1, P.O.-Box 102143 |
West Germany                   |

joey@tessi.UUCP (Joe Pruett) (05/31/88)

>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

This works, but is quite slow if the input to grep is large.  A hack
I've made to egrep is a switch of the form -<number>.  This causes only
the first <number> matches to be printed, and then the next file is
searched.  This is great for:

egrep -1 ^Subject *

in a news directory to get a list of Subject lines.

jqj@uoregon.uoregon.edu (JQ Johnson) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>grep '(your_pattern_here)' | head -1
This is, of course, unacceptable if you are searching a very long file
(say, a census database) and have LOTS of pipe buffering.

Too bad it isn't feasible to have a shell that can optimize pipelines.

dan@maccs.UUCP (Dan Trottier) (05/31/88)

In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes:
>> > There have been times when I wanted a grep that would print out the
>> > first occurrence and then stop.
>> 
>> grep '(your_pattern_here)' | head -1
>
>Doesn't cut it for
>
>	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

This is getting ridiculous and can be taken to just about any level...

	foreach i (file1 file2 ...)
	   grep 'pattern' $i | head -1
	end

-- 
       A.I. - is a three toed sloth!        | ...!uunet!mnetor!maccs!dan
-- Official scrabble players dictionary --  | dan@mcmaster.BITNET

leo@philmds.UUCP (Leo de Wit) (05/31/88)

In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes:
>I also recently had a need for printing multi-line "records" in which a
>specified pattern appeared somewhere in the record.  The following
>short csh script uses the awk capability to treat whole lines as fields
>and empty lines as record separators to print all the records from
>standard input that contain a line matching a regular specified as an
>argument:
>
>#!/bin/csh -f
>awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} '
>
>

Awk is a nice solution, but sed is a much faster one. I've been following 
the 'grep' discussion for some time now, and have seen much demand for
features that are simply within sed. Here are some; I have left the discussion
about the function of this or that sed-command out: there is a sed article and
a man page...

Patrick Powell writes:
>The other facility is to find multiple line patterns, as in:
>find the pair of lines that have pattern1 in the first line
>pattern2 in the second, etc.

Try this one:

        sed -n -e '/PATTERN1/,/PATTERN2/p' file

It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can
have subcommands to do special things (with '{' I mean).


Alan (..!cit-vax!elroy!alan) writes:
>One thing I would _love_ is to be able to find the context of what I've
>found, for example, to find the two (n?) surrounding lines.  I have wanted
>to do this many times and there is no good way.

There is. Try this one:

        sed -n -e '
/PATTERN/{
x
p
x
p
n
p
}
h' file

It prints the line before, the line containing the PATTERN, and the line after.
Of course you can make the output fancier and the number of lines printed
larger.


David Connet writes:
>>
>>One thing I would _love_ is to be able to find the context of what I've
>>found, for example, to find the two (n?) surrounding lines.  I have wanted
>>to do this many times and there is no good way.
>Also, what line number it was found on.

Sed can also handle this one:

        sed -n -e '/PATTERN/=' file


Lloyd Zusman writes:
>Or another way to get this functionality would be for this new greplike
>thing to allow matches on the newline character.  For example:
>    ^.*foo\nbar.*$
>          ^^
>    	newline

Sed can match on embedded newline characters in the substitute command 
(it is indeed \n here!). The trailing newline is matched by $.


Barry Shein writes [story about relative addressing]:
>I dunno, food for thought, like I said, maybe there's a generalization
>here somewhere. Or maybe grep should just emit line numbers in a form
>which could be post-processed by sed for fancier output (grep in
>backquotes on sed line.) Therefore none of this is necessary :-)

Quite right. I think most times you want to see the context it is in 
interactive use. In that case you can write a simple sed-script that does
just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines
, where N is a constant. The example I gave for N == 1 can be extended for
larger N, with fancy output etc.


Bill Wyatt writes: 
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

Much simpler, and faster:

        sed -n -e '/PATTERN/{
p
q
}' file

Sed quits immediately after finding the first match. You could even create an 
alias for something like that.


Michael Morrell writes:
>>Also, what line number it was found on.
>grep -n does this, but I'd like to see an option which ONLY prints the line
>numbers where the pattern was found.

The sed trick does this:

        sed -n -e '/PATTERN/=' file

Or you could even:

        sed -n -e '/PATTERN/{
=
q
}' file

which prints the first matched line number and exits.


Roy Smith writes:
>wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>[as a way to get just the first occurance of pattern]
>> grep '(your_pattern_here)' | head -1
>	Yes, it'll certainly work, but I think it bypasses the original
>intention; to save CPU time.  If I had a 1000 line file with pattern on
>line 7, I want grep to read the first 7 lines, print out line 7, and exit.
>grep|head, on the other hand, will read and search all 1000 lines of the
>file; it won't exit (with a EPIPE) until it writes another line to stdout
>and finds that head has already exited.  In fact, if grep block-buffers its
>output, it may never do more than a single write(2) and never notice that
>head has exited.

Quite right. The sed-solution I mentioned before is fast and neat. In fact, 
who needs head:

        sed 10q

does the job, as you can find in a book of Kernigan and Pike, I thought the 
title was 'the Unix Programming Environment'.


Stan Brown writes:
>	Along this same general line it would be nice to be abble to
>	look for paterns that span lines.  But perhaps this would be
>	tom complete a change in the philosophy of grep ?

As I mentioned before, embedded newlines can be matched by sed in the
substitute command.


What I also see often is things like

        grep 'pattern' file | sed 'expression'

A pity a lot of people don't know that sed can do the pattern matching itself.

S. E. D. (Sic Erat Demonstrandum)


As far as options for a new grep are conceirned, I suggest to use the options
proposed (and no more). Let other tools handle other problems - that's in the
Un*x spirit. What I would appreciate most in a new grep is:
no more grep, egrep, fgrep, just one tool that can be both fast (for fixed
strings) and elaborate (for pattern matching like egrep). The 'bm' tool that
was on the net (author Peter Bain) is very fast for fixed strings, using the
Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...?


        Leo.

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
|
|> There have been times when I wanted a grep that would print out the
|> first occurrence and then stop.
|
|grep '(your_pattern_here)' | head -1

Yes I have tried that. You are missing the point.

Have you ever waited for a computer?  

There are times when I want the first occurrence of a pattern without
reading the entire (i.e. HUGE) file.

Or there are times when I want the first occurrence of a pattern from
hundreds of files, but I don't want to see the pattern more than once.

And yes I know how to write a shell script that does this.

IMHO (sarcasm mode on), it is more efficient to call grep 
once for one hundred files, than to call (grep $* /dev/null|head -1) 
one hundred times. 
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

gwc@root.co.uk (Geoff Clare) (05/31/88)

Most of the useful things people have been saying they would like to be
able to do with 'grep' can already be done very simply with 'sed'.
For example:

    Stop after first match:   sed -n '/pattern/{p;q;}'

    Match over two lines:     sed -n 'N;/pat1\npat2/p;D'

It should also be possible to get a small number of context lines by
judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
tried it.  Anyway, this can be done with a normal line editor (if the data
to be searched aren't coming from a pipe) with 'g/pattern/-,+p'.

I was rather alarmed to see the proposal for 'pattern repeat' in the original
article was '\{pattern\}\1' rather than '\(pattern\)\1', as the latter is
already used for this purpose in the standard editors (ed, ex/vi, sed).
Or was it a typo?

By the way, does anyone know why the ';' command terminator in 'sed' is
not documented?  It works on all the systems I've tried it on, but I
have never found it in any manuals.  It's so much nicer than putting
the commands on separate lines, or using multiple '-e' options.
-- 

Geoff Clare    UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH
gwc@root.co.uk   ...!mcvax!ukc!root44!gwc   +44-1-606-7799  FAX: +44-1-726-2750

andyc@omepd (T. Andrew Crump) (05/31/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:

>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.

   >
   >grep '(your_pattern_here)' | head -1

Yes, but it forces grep to search a whole file, when what you may have wanted
was at the beginning.  This is inefficient if the "file" is large.

A more general version of this request would be a parameter that would restrict
grep to n or less occurrences, maybe 'grep -N #'.

-- Andy Crump

rbj@icst-cmr.arpa (Root Boy Jim) (05/31/88)

   From: Bill Wyatt <wyatt@cfa.harvard.edu>

   > There have been times when I wanted a grep that would print out the
   > first occurrence and then stop.

   grep '(your_pattern_here)' | head -1

Well, that *prints* what I want to see, but takes longer than I want to
wait. I want it to quit looking in the file. Besides, there is no way
you can do `grep -1 pattern *.[ch]' as trivially.

   Bill    UUCP:  {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt
   Wyatt   ARPA:  wyatt@cfa.harvard.edu
	    (or)  wyatt%cfa@harvard.harvard.edu
	 BITNET:  wyatt@cfa2
	   SPAN:  cfairt::wyatt 

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

glennr@holin.ATT.COM (Glenn Robitaille) (06/01/88)

> > > There have been times when I wanted a grep that would print out the
> > > first occurrence and then stop.
> > 
> > grep '(your_pattern_here)' | head -1
> 
> Doesn't cut it for
> 
> 	grep '(your_pattern_here)' firstfile secondfile thirdfile ...

Well, if you have a shell command like

	#
	# save the search patern
	#
	patern=$1
	#
	# remove search patern from $*
	#
	shift
	for i in $*
	do
		#
		# grep for search patern
		#
		line=`grep ${patern} ${i}|head -1`
		#
		# if found, print file name and string
		#
		test -n "$line" && echo "${i}:\t${line}"
	done

It'll work fine.  If you want to use other options, have them in
quotes as part of the first argument.


Glenn Robitaille
AT&T, HO 2J-207
ihnp4!holin!glennr
Phone (201) 949-7811

aeb@cwi.nl (Andries Brouwer) (06/01/88)

In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>
>> There have been times when I wanted a grep that would print out the
>> first occurrence and then stop.
>
>grep '(your_pattern_here)' | head -1

A fast way of searching for the first occurrence is really useful.
I have a version of grep called `contains', and a shell script
for formatting that says: if the input contains .[ then use refer;
if it contains .IS then ideal; if .PS then pic; if .TS then tbl, etc.

-- 
      Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl

booter@deimos.ads.com (Elaine Richards) (06/01/88)

In article <15030@brl-adm.ARPA> rbj@icst-cmr.arpa (Root Boy Jim) writes:
>	   Al Aho and I are designing a replacement for grep, egrep and fgrep.
>   The question is what flags should it support and what kind of patterns
>I have always thought it would be nice to print only the first match.


grep string filename |head -1

Sorry, I could not resist. Why not do an alias instead?

ER

hasch@gypsy.siemens-rtl (Harald Schaefer) (06/01/88)

If you are only interested in the first occurence of a pattern, you can use
something like
	sed -n '/<pattern>/ {
		p
		q
		}' file
Harald Schaefer
Siemens Corp. - RTL
Bus. Phone (609) 734 3389
Home Phone (609) 275 1356

uucp:	...!princeton!gypsy!hasch
	hasch@gypsy.uucp
ARPA:	hasch@siemems.com
	hasch%siemens@princeton.EDU

aburt@isis.UUCP (Andrew Burt) (06/01/88)

I'd like to see the following enhancements in a grepper:

	-  \< and \> to match word start/end as in vi, with -w option
		as in BSD grep to match pattern as a word.

	- \w in pattern to match whitespace (generalization: define
		\unused-letter as a pattern; or allow full lex capability).

	- way to invert piece of pattern such as: grep foo.*\^bar\^xyzzy
		with meaning as in: grep foo | grep -v bar | grep -v xyzzy
		(or could be written grep foo.*\^(bar|xyzzy) of course).

	-  Select Nth occurrence of match (generalization: list of
		matches to show: grep -N -2,5-7,10- ... to grab up to the 2nd,
		5th through 7th, and from the 10th onward).

	- option to show lines between matches (not just matching lines)
		as in: grep -from foo -to bar ... meaning akin to
		sed/ed's /foo/,/bar/p.  (But much more useful with other
		extensions).

	- Allow matching newlines in a "binary" (or non-text) sort of mode:
		grep -B 'foo.*bar'  finds foo...bar even if they are
		not on the same line.  (But printing the "line" that
		matches wouldn't be useful anymore, so just printing the
		matched text would be better.  Someone wanting lines could
		look for \n[^\n]*foo.*bar[^\n]*\n, though a syntax to
		make this easier might be in order.  Perhaps this wouldn't
		be an example of a binary case -- but a new character
		with meaning like '.' but matching ANY character would work:
		if @ is such a character then "grep foo@*bar".   Perhaps
		a better example, assuming the \^ for inversion syntax
		above would be "grep foo@*(\^bar)bar -- otherwise it would
		match from first foo to last bar, while I might want from
		first foo to first bar.)

	- provide byte offset of start of match (like block number or
		line number) useful for searching non-text files.

	- Provide a lib func that has the RE code in it.

	- Install RE code in other programs: awk/sed/ed/vi etc.
		Oh for a standardized RE algorithm!
-- 

Andrew Burt 				   			isis!aburt

              Fight Denver's pollution:  Don't Breathe and Drive.

jjg@linus.UUCP (Jeff Glass) (06/01/88)

In article <470@q7.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes:
> >grep '(your_pattern_here)' | head -1
> 
> This works, but is quite slow if the input to grep is large.  A hack
> I've made to egrep is a switch of the form -<number>.  This causes only
> the first <number> matches to be printed, and then the next file is
> searched.  This is great for:
> 
> egrep -1 ^Subject *
> 
> in a news directory to get a list of Subject lines.

Try:

	sed -n -e '/pattern/{' -e p -e q -e '}' filename

This prints the first occurrence of the pattern and then stops searching
the file.  The generalizations for printing the first <n> matches and
searching <m> files (where n,m > 1) are more awkward (no pun intended)
but are possible.

/jeff

brianm@sco.COM (Brian Moffet) (06/01/88)

In article <4537@vdsvax.steinmetz.ge.com> barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|grep '(your_pattern_here)' | head -1
>
>Or there are times when I want the first occurrence of a pattern from
>hundreds of files, but I don't want to see the pattern more than once.
>

Have you tried sed?  How about 

$ sed -n '/pattern/p;/pattern/q' file

???



-- 
Brian Moffet		brianm@sco.com  {uunet,decvax!microsof}!sco!brianm
The opinions expressed are not quite clear and have no relation to my employer.
'Evil Geniuses for a Better Tommorrow!'

anw@nott-cs.UUCP (06/01/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer)
writes:
> One thing I would _love_ is to be able to find the context of what I've
> found, for example, to find the two (n?) surrounding lines.  I have wanted
> to do this many times and there is no good way.

	See below.  Does n == 4, but easily changed.

In article <590@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes:
>
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.

	Which is not to say that they shouldn't also be in "*grep"!

>	[ good examples omitted ]
>
> It should also be possible to get a small number of context lines by
> judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't
> tried it.  [ ... ]

	The following is "/usr/bin/kwic" on this machine (PDP 11/44 running
V7).  I wrote it about three years ago in response to a challenge from some
AWK zealots;  it runs *much* faster than the equivalent AWK script.  That
is, it is sloooww rather than ssllloooooowwww.  I have a manual entry for
it which is too trivial to send.  Bourne shell, of course.  Use at whim
and discretion.  Several minor bugs, mainly (I hope!) limitations of or
between "sh" and "sed".  (Note that the various occurrences of multiple
spaces in "s..." commands are all TABs, in case mailers/editors/typists
have mangled things.)

> By the way, does anyone know why the ';' command terminator in 'sed' is
> not documented?  It works on all the systems I've tried it on, but I
> have never found it in any manuals.  It's so much nicer than putting
> the commands on separate lines, or using multiple '-e' options.

	No, I don't know why, but it isn't the only example in Unix of a
facility most easily discovered by looking in the source.  I've occasionally
used it, but I tried re-writing the following that way, and it *didn't* look
so much nicer;  in fact it looked 'orrible.

--------------------------------- [cut here] -----------------------------
[ $# -eq 0 ] && { echo "Usage: $0 pattern [file] ..." 1>&2; exit 1; }

l='[^\n]*\n' pat="$1" shift

exec sed -n   "/$pat"'/ b found
			s/^/	/
			H
			g
      /^'"$l$l$l$l$l"'/ s/\n[^\n]*//
			h
			b
	: found
			s/^/++	/
			H
			g
			s/.//p
			s/.*//
			h
	: loop
		      $ b out
			n
	     /'"$pat"'/ b found
			s/^/	/
			H
			g
	/^'"$l$l$l$l"'/ !b loop
	: out
			s/.//p
			s/.*/-----------------/
			h
	    ' ${1+"$@"}

-- 
Andy Walker, Maths Dept., Nott'm Univ., UK.
anw@maths.nott.ac.uk

andrew@alice.UUCP (06/01/88)

in my naivity, i had not been following netnews closely
after i posted the original ``grep replacement'' article.
I assumed that people would reply to me, not the net.
That is the reason i have not been participating in the discussion.
i will be posting my resolution of the suggestions shortly.

many people have written about patterns matching multiple lines.
grep will not do this. if you really need this, use sam by rob pike
as described in the nov 1987 software practice and experience.
the code is available for a plausible fee from the at&t toolchest.

jfh@rpp386.UUCP (John F. Haugh II) (06/02/88)

In article <2117@uoregon.uoregon.edu> jqj@drizzle.UUCP (JQ Johnson) writes:
>In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>>> There have been times when I wanted a grep that would print out the
>>> first occurrence and then stop.
>>grep '(your_pattern_here)' | head -1
>This is, of course, unacceptable if you are searching a very long file
>(say, a census database) and have LOTS of pipe buffering.
>
>Too bad it isn't feasible to have a shell that can optimize pipelines.

there is a boyer/moore based fast grep in the archives.  adding an
additional option (say '-f' for first in each file?) should be quite
simple.

perhaps i'll post the diff's if i remember to go hack on the sucker
any time soon.

- joh.
-- 
John F. Haugh II                 | "If you aren't part of the solution,
River Parishes Programming       |  you are part of the precipitate."
UUCP:   ihnp4!killer!rpp386!jfh  | 		-- long since forgot who
DOMAIN: jfh@rpp386.uucp          | 

kent@happym.UUCP (Kent Forschmiedt) (06/02/88)

In article <136@rubmez.UUCP> frei@rubmez.UUCP (Matthias Frei ) writes:
>I want following flags:
>
>	- d	divert the file
>		"matches" to stdout
>		"nomatches" to stderr
>	-r	exchange stdout and stderr, if -d is given  

I second the vote - just today I did one of these:

grep $PATTERN file > afile
grep -v $PATTERN file > anotherfile

Note, however, that -v will serve for the suggested -r.

>Will you post Your new grep to the net ? (I hope so)

From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
Unix, and none of us humans will see it until sysVr6, and only then 
if we are lucky!! 
-- 
--
	Kent Forschmiedt -- kent@happym.UUCP, tikal!camco!happym!kent
	Happy Man Corporation  206-282-9598

john@frog.UUCP (John Woods) (06/02/88)

In article <590@root44.co.uk>, gwc@root.co.uk (Geoff Clare) writes:
> Most of the useful things people have been saying they would like to be
> able to do with 'grep' can already be done very simply with 'sed'.
> For example:
>     Stop after first match:   sed -n '/pattern/{p;q;}'

Close, but no cigar.  It does not work for multiple input files.
(And, of course, spawning off a new sed for each file defeats the basic desire
of most of the people who've asked for it:  speed)

However,

	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *

does (just about) work.  And it's probably not _obscenely_ slow.
(it doesn't behave for no input files, and you might prefer no FILENAME: for
just a single input file)
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

peter@ficc.UUCP (Peter da Silva) (06/02/88)

In article <136@rubmez.UUCP>, frei@rubmez.UUCP (Matthias Frei ) writes:
> So I want following flags:

> 	- d	divert the file "matches" to stdout "nomatches" to stderr

Good, but...

> 	-r	exchange stdout and stderr, if -d is given  

Shouldn't this case (-r) be handled by the existing '-v' flag?
-- 
-- Peter da Silva, Ferranti International Controls Corporation.
-- Phone: 713-274-5180. Remote UUCP: uunet!nuchat!sugar!peter.

mdorion@cmtl01.UUCP (Mario Dorion) (06/03/88)

In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes:
> In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
> >
> >One thing I would _love_ is to be able to find the context of what I've
> >found, for example, to find the two (n?) surrounding lines.  I have wanted
> >to do this many times and there is no good way.
> 
> Also, what line number it was found on.
> 
> David Connet
> ihnp4!ihlpe!dcon

Ever tried grep -n ?????

There are three features I would like to see in a grep-like program:

1- Be able to use a newline character in the regular expression
       grep 'this\nthat' file 

2- Be able to grep more than one regular expression with one call. This would
   be faster than issuing many calls since the file would be read only once.

3- To have an option to search only for the first occurence of the pattern.
   Sometimes you KNOW that the pattern is there only once (for example if you
   grep '^Subject:' on news files) and there's just no need to scan the rest of
   the file. When 'grepping' into many files it would return the first occurence
   for each file.

-- 
     Mario Dorion              | ...!{rutgers,uunet,ihnp4}!     
     Frisco Bay Industries     |            philabs!micomvax!cmtl01!mdorion
     Montreal, Canada          |
     1 (514) 738-7300          | I thought this planet was in public domain!

andrew@alice.UUCP (06/03/88)

In article <449@happym.UUCP>, kent@happym.UUCP writes:
> From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
> Unix, and none of us humans will see it until sysVr6, and only then 
> if we are lucky!! 


Context:
	the right thing to do is to write a context program that takes
input looking like "filename:linenumber:goo" and prints whatever context you like.
we can then take this crap out of grep and diff and make it generally available
for use with programs like the C compiler and eqn and so on. It can also do
the right thing with folding together nearby lines. At least one good first
cut has been put on the net but a C program sounds easy enough to do.

Source:
	the software i write is publicly available because it matters to me.
it was a hassle but mk and fio are available to everybody for reasonable cost
(< $125 commercial, nearly free educational). i am trying hard to do the
same for the new grep. it will be in V10, it will be in plan9, and should be
in SVR4 (the joint sun-at&t release).

allbery@ncoast.UUCP (Brandon S. Allbery) (06/05/88)

As quoted from <2312@bgsuvax.UUCP> by kutz@bgsuvax.UUCP (Kenneth Kutz):
+---------------
| In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
| > One thing I would _love_ is to be able to find the context of what I've
| > found, for example, to find the two (n?) surrounding lines.  I have wanted
| > to do this many times and there is no good way.
+---------------

	grep -n foo ./bar | context 2

I posted context to net.sources back when it existed; someone may still have
archives from that time, if not I'll retrieve my sources and repost it.  It
takes lines of the basic form

	filename ... linenumber : ...

and displays context around the specified lines.  I use this with grep quite
often; it also works with cc (pcc, not Xenix cc) error messages.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

gwyn@brl-smoke.UUCP (06/05/88)

In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>	the right thing to do is to write a context program that takes
>input looking like "filename:linenumber:goo" and prints whatever context ...

Heavens -- a tool user.  I thought that only Neanderthals were still alive.
I guess Bell Labs escaped the plague.

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
>|
>|> There have been times when I wanted a grep that would print out the
>|> first occurrence and then stop.
>|
>|grep '(your_pattern_here)' | head -1
>
[...]
>
>Have you ever waited for a computer?  

No, never. :-)

>There are times when I want the first occurrence of a pattern without
>reading the entire (i.e. HUGE) file.

I realize this is dependent on the way in which processes sharing a
pipe act, but this is a point worth considering before we get yet
another annoying burst of "cat -v" type programs.

grep pattern file1 ... fileN | head -1

This should send grep a SIGPIPE as soon as the first line of output
trickles through the pipe.  This would result in relatively little
of the file actually being read under most Unix implementations.
I would agree that it is a bad thing to rely on the granularity of
a pipe.  Here is a sample program which can be used to show you what
I mean.

Name it grep, and use it thus wise:

% ./grep pattern * | head -1

/* ------------- Cut here --------------- */
#include <stdio.h>
#include <signal.h>

sighandler(sig)
    int sig;
{
    if (sig == SIGPIPE)
	fprintf(stderr,"Died from a SIGPIPE\n");
    else
	fprintf(stderr,"Died from signal #%d\n", sig);
    exit(0);
}

main()
{
    signal(SIGPIPE,sighandler);
    for (;;)
	printf("pattern\n");
}
/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)

I can think of a few nasty ways to do this one, I am hoping to get
a better answer.

A grep with a window of context around it.  A few lines proceeding and
following the pattern I am looking for.  The VMS search command sported
this as an option/qualifier.  I miss it sometimes (not VMS, just a few
of the more wacky utilities, like the editor option for creation of
multi-key data base files :-).

/*    Jim Hutchison   		UUCP:	{dcdwest,ucbvax}!cs!net1!hutch
		    		ARPA:	Hutch@net1.ucsd.edu
Disclaimer:  The cat agreed that it would be o.k. to say these things.  */

tbray@watsol.waterloo.edu (Tim Bray) (06/05/88)

Grep should, where reasonable, not be bound by the notion of a 'line'.
As a concrete expression of this, the useful grep -l (prints the names of
the files that contain the string) should work on any kind of file.  More
than one existing 'grep -l' will fail, for example, to tell you which of a 
bunch of .o files contain a given string.  Scenario - you're trying to
link 55 .o's together to build a program you don't know that well.  You're
on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
The source is scattered inconveniently.  The obvious thing to do is: 
grep -l _memcpy *.o
That this often will not work is irritating.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Heavens -- a tool user.  I thought that only Neanderthals were still alive.
>I guess Bell Labs escaped the plague.

Almost, unless the original input was produced by a pipeline, in which
case this (putative) post-processor can't help unless you tee the mess
to a temp file, yup, mess is the right word.

Or maybe only us Neanderthals are interested in tools which work on
pipes? Have they gone out of style?

	-Barry "Ulak of Org" Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

The proposed tool would be very handy on ordinary text files,
but it is hard to see a use for it on pipes.  Or, getting back
to context-grep, what good would it do to show context from a
pipe?  To do anything with the information (other than stare
at it), you'd need to produce it again.  There might be some
use for context-{grep,diff,...} on a stream, but if a separate
context tool will satisfy 99% of the need, as I think it would,
as well as provide this capability for other commands "for free",
it would be a better approach than hacking context into other
commands.

By the way, I hope the new grep when asked to always produce
the filename will use "-" for stdin's name, and the context
tool would also follow the same convention.  Even though the
Research systems have /dev/stdin, other sites may not, and
anyway (as we've just seen) stdin isn't really a definite
object.

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/05/88)

In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>	the right thing to do is to write a context program that takes
>>input looking like "filename:linenumber:goo" and prints whatever context ...
>
>Almost, unless the original input was produced by a pipeline, in which
>case this (putative) post-processor can't help unless you tee the mess
>to a temp file, yup, mess is the right word.

How about:

alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

or something like that?  Does that offend tool-users sensibilities?
*Do* Neanderthals have any sensibilities?
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>The proposed tool would be very handy on ordinary text files,
>but it is hard to see a use for it on pipes.  Or, getting back
>to context-grep, what good would it do to show context from a
>pipe?  To do anything with the information (other than stare
>at it), you'd need to produce it again.

What else are context displays for except to stare at (or save in a
file for later staring)?

Are the resultant contexts often the input to other programs? (I know
that 'patch' can take a context input but that's irrelevant, it hardly
needs nor prefers a context diff to my knowledge, it's just being
accomodating so humans can look at the context diff if something
botches.)

Actually, I can answer that in the context of the original suggestion.

The motivation for a context comes in two major flavors:

	A) To stare at (the surrounding context gives a human some
	hint of the context in which the text appeared)

	B) Because the context really represents a multi-line (eg)
	record, such as pulling out every termcap or terminfo entry
	which contains some property but desiring the result to contain
	the entire multiline entry so it could be re-used to create a
	new file.

In either case it's independent of whether the data is coming from a
pipe (as it should be.) Its pipeness may be caused by something as
simple as the data being grabbed across the network (rsh HOST cat foo | ...).

Anyhow, I think it's bad in general to demand the reasoning of why a
selection operator should work in a pipe, it just should (although I
have presented a reasonable argument.) That's what tools are all about.

>There might be some
>use for context-{grep,diff,...} on a stream, but if a separate
>context tool will satisfy 99% of the need, as I think it would,
>as well as provide this capability for other commands "for free",
>it would be a better approach than hacking context into other
>commands.

I think claiming that 99% of the use won't need pipes is unsound, it
should just work with a pipe and any tool which requires passing the
file name and then re-positioning the file just won't, it's violating
a fundamental design concept by doing this (not that in rare cases
this might not be necessary, but I don't see where this is one of them
unless you use the circular argument of it "must be a separate
program".)

The reasoning for adding it to grep would be:

	a) Grep already has its finger on the context, it's right
	there (or could be), why re-process the entire stream/file
	just to get it printed? Grep found the context, why find it
	again?

	b) The context suggestions are merely logical generalizations
	of the what grep already does, print the context of a match
	(it just happens to now limit that to exactly one line.) Nothing
	new conceptually is being added, only generalized.

In fact, if I were to write this context-display tool my first thought
would be to just use grep and try to emit unique patterns (a la TAGS
files) which grep can then re-scan. But grep doesn't quite cut it w/o
this little generalization. I think we're going in circles and this
post-processor is nothing more than a special case of grep or perhaps
cat or sed the way it was proposed (why not just generate sed commands
to list the lines if that's all you want?)

Anyhow, at least we're back to the technical issues and away from
calling anyone who disagrees Neanderthals...

	-Barry Shein, Boston University

bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)

From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>Almost, unless the original input was produced by a pipeline, in which
>>case this (putative) post-processor can't help unless you tee the mess
>>to a temp file, yup, mess is the right word.
>
>How about:
>
>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>
>or something like that?  Does that offend tool-users sensibilities?
>*Do* Neanderthals have any sensibilities?

I don't understand, the way to avoid having to tee it into temp
files is to tee it into temp files?

Given that sort of solution we can eliminate pipes entirely from unix,
was that your point? That pipes are fundamentally useless and can
always be eliminated via use of intermediate temp files?

It begs the question, burying it in a little syntactic sugar with an
alias command doesn't solve the problem.

	-Barry Shein, Boston University

andrew@alice.UUCP (06/06/88)

> Almost, unless the original input was produced by a pipeline, in which
> case this (putative) post-processor can't help unless you tee the mess
> to a temp file, yup, mess is the right word.
> Or maybe only us Neanderthals are interested in tools which work on
> pipes? Have they gone out of style?


the problem is in the limited plumbing available in the current flock of shells.
i mean, the context stuff is exactly the same issue as glob expansion;
you either build into an ad hoc set of programs or let one program do it.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/06/88)

In article <23142@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Anyhow, at least we're back to the technical issues and away from
>calling anyone who disagrees Neanderthals...

Oh, but the latter is much more fun!

Anyway, the fundamental issue seems to be that there are (at least)
two types of external data objects:
	streams -- transient data, takes special effort to capture
	files -- permanent data with an attached name
UNIX nicely makes these appear much the same, but they do have some
inherent differences, and this one-pass versus multi-pass context
discussion has brought out one of them.

There is nothing particularly wrong with the "tee" approach to
turn a stream into a file long enough for whatever work is being
done.  The converse is often done; for example many of my shell
scripts, after parsing arguments, exec a pipeline that starts
	cat $* | ...
in order to ensure a stream input to the rest of the pipeline.

garyo@masscomp.UUCP (Gary Oberbrunner) (06/06/88)

The only change I've ever had to make to the source for grep to make it do
what I want was to make it work with arbitrary-length lines.
I consider not handling long lines (and not complaining about them either)
to be extremely antisocial.  All this other stuff is just window-dressing.
Not that it's bad; one integrated grep with B-M strings, alternation and
inversion operators, and nifty feeping creaturism is great by me.

I usually handle the multi-line-record case by tr'ing all the intermediate
line ends into some unused character, doing my database hackery (grep, awk,
sed, what have you) and then tr'ing back at the end.  This is one reason for
having grep support very long lines.

				As always,

				Gary

----------------------------------------------------------------------------
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp
-- 
Remember,			Truth is not beauty;      (617)692-6200x2445
Information is not knowledge;	Beauty is not love;	  Gary   Oberbrunner
Knowledge is not wisdom;	Love is not music;	  ...!masscomp!garyo
Wisdom is not truth;		Music is the best. - FZ   ....garyo@masscomp

bzs@bu-cs.BU.EDU (Barry Shein) (06/06/88)

From: gwyn@brl-smoke.ARPA (Doug Gwyn )
>There is nothing particularly wrong with the "tee" approach to
>turn a stream into a file long enough for whatever work is being
>done.  The converse is often done; for example many of my shell
>scripts, after parsing arguments, exec a pipeline that starts
>	cat $* | ...
>in order to ensure a stream input to the rest of the pipeline.

Nothing wrong with it unless you happen to be on a parallel machine as
I am a lot of the time and pipes can run in parallel nicely.

Nyah Nyah, got ya there! PHFZZZZT! I win! I win!

You're right, this is getting ridiculous, we made our points...

Ok everyone, back to arguing which flags should be maintained in cat
and Unix Standardization AKA "West Coast Story" (snap fingers.)

	-Barry Shein, Boston University

preece%fang@xenurus.gould.com (Scott E. Preece) (06/06/88)

From: Doug Gwyn  <gwyn@brl-smoke.arpa>
> The proposed tool would be very handy on ordinary text files, but it is
> hard to see a use for it on pipes.  Or, getting back to context-grep,
> what good would it do to show context from a pipe?  To do anything with
> the information (other than stare at it), you'd need to produce it
> again.
----------
Well, actually, I often produce output for which I have no other use
than staring at.  Sometimes one uses the system to find an answer,
rather than to create grist for some future use...

-- 
scott preece
gould/csd - urbana
uucp:	ihnp4!uiucuxc!urbsdc!preece
arpa:	preece@Gould.com

nelson@sun.soe.clarkson.edu (Russ Nelson) (06/06/88)

In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>I don't understand, the way to avoid having to tee it into temp
>files is to tee it into temp files?

No.  There is no way to avoid teeing it into a temp file.  Such is
life with pipes.  If you want context then you need to save it.  My
alias is perfectly consistent with the tool-using philosophy.  Yes,
it's a kludge, but that's the only way to save context in a single-stream
pipe philosophy.  I remember reading a paper in which multiple streams
going hither and yon were proposed, but the syntax was gothic at best.
I like being able to say this:

bsd:	sort | with_context grep rfoo | more
sysv:	sort | with_context grep foo | more
	Because sysv doesn't have the r* utilities, of course :-)
-- 
signed char *reply-to-russ(int network) {	/* Why can't BITNET go	*/
if(network == BITNET) return "NELSON@CLUTX";	/* domainish?		*/
else return "nelson@clutx.clarkson.edu"; }

tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (06/07/88)

In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes:
|
|One thing I would _love_ is to be able to find the context of what I've
|found, for example, to find the two (n?) surrounding lines.  I have wanted
|to do this many times and there is no good way.

GNU Emacs has a command that will walk you through each match of a
grep run and show you the context around it:

   grep:
   Run grep, with user-specified args, and collect output in a buffer.
   While grep runs asynchronously, you can use the C-x ` command
   to find the text that grep hits refer to.

M-x grep RET to invoke it.  I suspect other Unix Emacs have a similar
feature.

Information on how to obtain GNU Emacs, other GNU software, or the GNU
project itself is available from:

	gnu@prep.ai.mit.edu

enjoy -len

brianc@cognos.uucp (Brian Campbell) (06/07/88)

In article <4524@vdsvax.steinmetz.ge.com> Bruce G. Barnett writes:
> There have been times when I wanted a grep that would print out the
> first occurrence and then stop.

In article <1036@cfa.cfa.harvard.EDU> Bill Wyatt suggests:
> grep '(your_pattern_here)' | head -1

In article <4537@vdsvax.steinmetz.ge.com> Bruce G. Barnett replies:
> There are times when I want the first occurrence of a pattern without
> reading the entire (i.e. HUGE) file.

If we're talking about finding subject lines in news articles:
	head -20 file1 file2 ... | grep ^Subject:

> Or there are times when I want the first occurrence of a pattern from
> hundreds of files, but I don't want to see the pattern more than once.

In this case, the original suggestion seems appropriate:
	grep pattern file1 file2 ... | head -1
-- 
Brian Campbell        uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc
Cognos Incorporated   mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4
(613) 738-1440        fido: (613) 731-2945 300/1200/2400, sysop@1:163/8

oz@yunexus.UUCP (Ozan Yigit) (06/08/88)

In article <7939@alice.UUCP> andrew@alice.UUCP writes:
>
>many people have written about patterns matching multiple lines.
>grep will not do this. if you really need this, use sam by rob pike
>as described in the nov 1987 software practice and experience.
>
	Why should this not be done by grep ??? I think Rob Pike's
	"Structured Expressions" is the way to go for a modern grep,
	where newline spanning is supported, and the program does
	not die unexpectedly just because a file contains a line too
	long for a stupid internal "line size". (For an insightful
	discussion of this, interested readers could check out Rob's
	paper in EUUG proceedings.)

oz
-- 
The deathstar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense sunbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

jgreely@dimetrodon.cis.ohio-state.edu (J Greely) (06/08/88)

In article <1998@u1100a.UUCP> krohn@u1100a.UUCP (Eric Krohn) writes:
>To put in a plug for Larry Wall's perl language (Release 2.0 due soon at a
>comp.sources.unix near you):

>[suggests the following script for grep-first-only]
>perl -n -e 'if(/Subject/){print $ARGV,":",$_;close(ARGV);}' * >/dev/null

This works, and is indeed faster.  However, it shares one problem with
all of the others: '*' expansion.  As an (uncomfortable) example,
/usr/spool/news/talk/bizarre has over 2500 articles in it at our site,
and the shell can't expand that properly (SunOS 3.4, if it matters).
So, the following perl script accomplishes the same thing, no matter
how many files need searched:

#!/usr/local/bin/perl
while ($File = <*>) {
  open(file,$File);
  while (<file>) {
    if (/^Subject/){
      print $File,":",$_;
      last;
    }
  }
  close(file);
}

It's about as fast as the one-liner, and more robust.
-=-
       (jgreely@cis.ohio-state.edu; ...!att!cis.ohio-state.edu!jgreely)
		  Team Wheaties says: "Just say NO to rexd!"
	       /^Newsgroups: .*\,.*\,.*\,/h:j   /[Ww]ebber/h:j
	       /[Bb]irthright [Pp]arty/j        /[Pp]ortal/h:j

guy@gorodish.Sun.COM (Guy Harris) (06/09/88)

> No, the obvious thing to do is:
> 
> nm -o _memcpy *.o

"Obvious" under which version of UNIX?  From the 4.3BSD manual:

	-o	Prepend file or archive element name to each output line
		rather than only once.

The SunOS manual page says the same thing.

From the S5R3 manual:

	-o	Print the value and size of a symbol in octal instead of
		decimal.

With the 4.3BSD version you can do

	nm -o *.o | egrep _memcpy

and get the result you want.  For any version of "nm" that I know of, you can
do the "egrep" trick mentioned in another posting; you may have to use a flag
such as "-p" with the S5 version to get "easily parsable, terse output."

john@frog.UUCP (John Woods) (06/09/88)

Hypothesize for the moment that I would like to have the Subject: lines for
each article in /usr/spool/news/comp/sources/unix.  Many people have proposed
a new flag for the "new grep" (one that functions just like the -one flag does
on "match", the matching program I use (a flag I implemented long ago)).

In article<5007@sdcsvax.UCSD.EDU>,hutch@net1.ucsd.edu(Jim Hutchison) suggests:
> grep pattern file1 ... fileN | head -1
> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.  This would result in relatively little
> of the file actually being read under most Unix implementations.

Yes, it would result in relatively little of the file being read.  It would
also result in relatively little of the desired output.  Check the problem
space before posting solutions, folks.

As I pointed out in another message, you can get awk to solve the problem
almost exactly, with some irregularity in the NFILES={0,1} cases.  However,
the "tool-using" approach is a two-edged sword, it seems to me:  a matching
problem should be solvable by using the matching tool, not by a special case
of an editor tool (the purported "sed" solution) or by having to reach for
a full-blown programming language (awk); just as one should not paginate
a text file by using the /PAGINATE /NOPRINT features of a line-printer
program...  Sometimes you need to EN-feature a program in order to avoid
having to turn to (other) inappropriate tools.  "Oh, you can't ADD text
with this editor, only change existing text.  You add text by using
'cat >> filename' ..."

I like the "context" tool suggested elsewhere, but it has one problem (as
stated) for replacing context diffs:  context diffs are both context and
_differences_, and are generally clearly marked as such (i.e., the !+-
convention); while I guess you could turn an ed-script style diff listing
into a context diff (given both input files and the diff marks), that is
a radically different input language than that proposed for eliminating
context grep.  This just means, however, that two context tools are needed,
not just one.

To paraphrase Einstein, "Programs should be as simple as possible, and no
simpler."
-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

john@frog.UUCP (John Woods) (06/09/88)

In article <1998@u1100a.UUCP>, krohn@u1100a.UUCP (Eric Krohn) writes:
> In article <1112@X.UUCP> john@frog.UUCP (some clown :-) writes:
> ] 	awk '/^Subject: /	{ print FILENAME ":" $0; next }' *
> 
> This will print Subject: lines more than once per file if a file happens to
> have more than one Subject: line.  `Next' goes to the next input line, not
> the next input file, so you are still left with an exhaustive search of all
> the files.
> 
Oops.  I blew it.  Working on GNU awk seems to have permanently damaged my
brain (there are a couple of differences between "real" awk and GNU awk which
I couldn't convince the author were worth changing, specifically in 'exit'
(not next); GNU exit actually does what I thought next would do, instead of
exiting entirely).  

-- 
John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu

No amount of "Scotch-Guard" can repel the ugly stains left by REALITY...
		- Griffy

leo@philmds.UUCP (Leo de Wit) (06/09/88)

In article <449@happym.UUCP> kent@happym.UUCP (Kent Forschmiedt) writes:
>In article <136@rubmez.UUCP> frei@rubmez.UUCP (Matthias Frei ) writes:
>>I want following flags:
>>
>>	- d	divert the file
>>		"matches" to stdout
>>		"nomatches" to stderr
>>	-r	exchange stdout and stderr, if -d is given  
>I second the vote - just today I did one of these:
>
>grep $PATTERN file > afile
>grep -v $PATTERN file > anotherfile
>
>Note, however, that -v will serve for the suggested -r.
>>Will you post Your new grep to the net ? (I hope so)
>From alice.UUCP??  Ha ha!  That's Bell Labs!  It will be in V10 
>Unix, and none of us humans will see it until sysVr6, and only then 
>if we are lucky!! 

You are lucky, because here's your_new_grep:

---------------------- S T A R T   H E R E ---------------
#!/bin/sh
# Usage: yngrep pattern matches nomatches [file ...]

case $# in
0|1|2) echo "Usage: $0 <pattern> <matchfile> <nomatchfile> [file ...]"; exit;;
*) pattern=$1 matches=$2 nomatches=$3; shift; shift; shift;;
esac

exec sed -n -e "
/$pattern/w $matches
/$pattern/!w $nomatches" $*
---------------------- S T O P     H E R E ---------------

Use the p command of sed to write to stdout. I don't know how to write to the
stderr from within sed. Don't think exec 2>outfile beforehand works, because
sed does not open for append. But you could use w /dev/tty, that's often what
you want for stderr anyway 8-).
Hope it works right away, didn't test it.

	Leo.

jad@insyte.UUCP (Jill Diewald) (06/09/88)

A missing feature in UNIX is the ability to deal with files with
very long lines - the kind of file you get from a data base tape like
Compustat.  The standard data base tape contains very long lines,
instead of separating each record with a newline, all the records may
be on the same line.  There is a defined record size which is used to
determine when a record ends - instead of a newline.

There are two specific things that it would be nice to do with UNIX, 
instead of having to write a c program:

1) To be able to give grep (also awk, etc) a record size 
which it would use instead of newlines.  

2) To be able to specify a field range (ie columns 20-30) for the 
program to search - instead of the entire line/record.  In addition it 
should be posible to specify several fields in one grep.  For 
example: to search for records which have "1000" in one field or
"2000" in another.  Sort uses fields so they aren't totally foreign
to UNIX.

vanam@pttesac.UUCP (Marnix van Ammers) (06/10/88)

In article <4524@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes:

>There have been times when I wanted a grep that would print out the
>first occurrence and then stop.

sed -n -e "/<pattern>/ { p" -e q -e "}"

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <779@yabbie.rmit.oz>, rcodi@yabbie.rmit.oz (Ian Donaldson) writes:
> From article <3324@phri.UUCP>, by roy@phri.UUCP (Roy Smith):
>> [A grep option to stop after one match] would certainly speed up
>> things like
>> 	grep "^Subject: " /usr/spool/news/comp/sources/unix/*

> A simple permutation:
>  	head -60 /usr/spool/news/comp/sources/unix/* | grep "^Subject: "
> works fairly close to the mark, and doesn't waste much time at all.

Except that it doesn't list the filename along with the match.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>> the right thing to do is to write a context program that takes input
>> looking like "filename:linenumber:goo" and prints whatever context ...

> Heavens -- a tool user.  I thought that only Neanderthals were still
> alive.  I guess Bell Labs escaped the plague.

A real useful `tool', this, that works only on files.  And only when
you grep more than one file, so you get filenames (or happen to be able
to remember which flag it is to make grep print filenames always,
assuming of course that your grep has it).

Besides, grep has the context, or could have if it wanted to bother
saving it.  Why read all two hundred thousand lines of the file
*again*?  Wasn't it bad enough the first time?

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <1030@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>> In article <7944@alice.UUCP> andrew@alice.UUCP writes:
>>> the right thing to do is to write a context program that takes
>>> input looking like "filename:linenumber:goo" and prints whatever
>>> context ...
>> Almost, unless the original input was produced by a pipeline, [...]
>> unless you tee the mess to a temp file, yup, mess is the right word.
> How about:
> alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$

This assumes that (a) there's room on /tmp to save the whole thing and
(b) that you don't mind rereading it all to find the appropriate line.

Both assumptions are commonly violated, in my experience.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <8022@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Or, getting back to context-grep, what good would it do to show
> context from a pipe?  To do anything with the information (other than
> stare at it), you'd need to produce it again.

Why do we have diff -c?  Generally, to stare at.  (The only other use I
know of is producing diffs for Larry Wall's patch program.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <5007@sdcsvax.UCSD.EDU>, hutch@net1.ucsd.edu (Jim Hutchison) writes:
> 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
>> In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
[attribution(s) lost]
>>>> There have been times when I wanted a grep that would print out
>>>> the first occurrence and then stop.
>>> grep '(your_pattern_here)' | head -1
>> Have you ever waited for a computer?  There are times when I want
>> the first occurrence of a pattern without reading the [whole file].

> grep pattern file1 ... fileN | head -1

> This should send grep a SIGPIPE as soon as the first line of output
> trickles through the pipe.

No.  It should not send the SIGPIPE until grep writes the second line.
And because grep is likely to use stdio for its output, nothing at all
may be written to the pipe until grep has 1K or 2K or whatever size its
stdio uses for the output buffer.  This may be an enormous waste of
time, both cpu and real.

Besides which, it's wrong.  It prints just the first match, whereas
what's wanted is the first match *from each file*.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <7207@watdragon.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes:
> Scenario - you're trying to link 55 .o's together to build a program
> you don't know that well.  You're on berklix.  ld sez: "undefined:
> _memcpy".  You say: "who's doing that?".  The source is scattered
> inconveniently.  The obvious thing to do is:  grep -l _memcpy *.o

Doesn't anybody read the man pages any more?  The obvious thing is to
use the supplied facility: the -y option to ld.

% cc -o program *.o -y_memcpy
Undefined:
_memcpy
buildstruct.o: reference to external undefined _memcpy
copytree.o: reference to external undefined _memcpy
%

(I don't know how generally available this is.  You did say "berklix",
and I know this is in 4.3, but I don't know about other Berklices.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)

In article <1037@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes:
> In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>> From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me]
>>> alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$
>> I don't understand, the way to avoid having to tee it into temp
>> files is to tee it into temp files?
> No.  There is no way to avoid teeing it into a temp file.

Sure there is.

> If you want context then you need to save it.

True.  But you don't necessarily need to save it in a file.

> [the alias above is] the only way to save context in a single-stream
> pipe philosophy.

Grep can save it in memory.  Unless you want so much context that it
overflows the available memory, which I find difficult to see
happening, this is a perfectly good place to put it.

In fact, I wrote a grep variant which starts by snarfing the whole file
into (virtual) memory.  Makes for extreme speed when it's usable, which
is often enough to make it worthwhile (for me, at least).  And of
course it means that I could get as much context as I cared to.  (I've
never had it fail because it couldn't get enough memory to hold the
whole file.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

andrew@alice.UUCP (06/11/88)

	The following is a summary of the somewhat plausible ideas
suggested for the new grep. I thank leo de witt particularly and others
for clearing up misconceptions and pointing out (correctly) that
existing tools like sed already do (or at least nearly do) what some people
asked for. The following points are in no particular order and no slight is
intended by my presentation. After that, I summarise the current flags.

1) named character classes, e.g. \alpha, \digit.
	i think this is a hokey idea and dismissed it as unnecessary crud
	but then found out it is part of the proposed regular expression
	stuff for posix. it may creep in but i hope not.

2) matching multi-line patterns (\n as part of pattern)
	this actually requires a lot of infrastructure support and thought.
	i prefer to leave that to other more powerful programs such as sam.

3) print lines with context.
	the second most requested feature but i'm not doing it. this is
	just the job for sed. to be consistent, we just took the context
	crap out of diff too. this is actually reasonable; showing context
	is the job for a separate tool (pipeline difficulties apart).

4) print one(first matching) line and go onto the next file.
	most of the justification for this seemed to be scanning
	mail and/or netnews articles for the subject line; neither
	of which gets any sympathy from me. but it is easy to do
	and doesn't add an option; we add a new option (say -1)
	and remove -s. -1 is just like -s except it prints the matching line.
	then the old grep -s pattern is now grep -1 pattern > /dev/null
	and within epsilon of being as efficent.

5) divert matching lines onto one fd, nonmatching onto another.
	sorry, run grep twice.

6) print the Nth occurence of the pattern (N is number or list).
	it may be possible to think of a real reason for this (i couldn't)
	but the answer is no.

7) -w (pattern matches only words)
	the most requested feature. well, it turns out that -x (exact)
	is there because doug mcilroy wanted to match words against a dictionary.
	it seems to have no other use. Therefore, -x is being dropped
	(after all, it only costs a quick edit to do it yourself) and is
	replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]).

8) grep should work on binary files and kanji.
	that it should work on kanji or any character set is a given
	(at least, any character set supported by the system V international
	character set stuff). binary files will work too modulo the
	following restraint: lines (between \n's) have to fit in a
	buffer (current size 64K). violations are an error (exit 2).

9) -b has bogus units.
	agreed. -b now is in bytes.

10) -B (add an ^ to the front of the given pattern, analogous to -x and -w)
	-x (and -w) is enough. sorry.

11) recursively descend through argument lists
	no. find | xargs is going to have to do.

12) read filenames on standard input
	no. xargs will have to do.

13) should be as fast as bm.
	no worries. in fact, our egrep is 3xfaster than bm. i intend to be
	competitive with woods' egrep. it should also be as fast as fgrep for
	multiple keywords. the new grep incorporates boyer-moore
	as a degenerate case of Commentz-Walter, a faster replacement
	for the fgrep algorithm.

14) -lv (files that don't have any matching lines)
	-lv means print names of files that have any nonmatching lines
	(useful, say, for checking input syntax). -L will mean print
	names of files without selected lines.

15) print the part of the line that matched.
	no. that is available at the subroutine level.

16) compatability with old grep/fgrep/egrep.
	the current name for the new command is gre (aho chose it).
	after a while, it will become our grep. there will be a -G
	flag to take patterns a la old grep and a -F to take
	patterns a la fgrep (that is, no metacharacters except \n == |).
	gre is close enough to egrep to not matter.

17) fewer limits.
	so far, gre will have only one limit, a line length of 64K.
	(NO, i am not supporting arbitrary length lines (yet)!)
	we forsee no need for any other limit. for example, the
	current gre acts like fgrep. it is 4 times faster than
	fgrep and has no limits; we can gre -f /usr/dict/words
	(72K words, 600KB).

18) recognise file types (ignore binaries, unpack packed files etc).
	get real. go back to your macintosh or pyramid. gre will just grep
	files, not understand them.

19) handle patterns occurring multiple times per line
	this is illdefined (how many time does aaaa occur in a line of 20 'a's?
	in order of decreasing correctness, the answers are >=1, 17, 5).
	For the cases people mentioned (words), pipe it thru
	tr to put the words one per line.

20) why use \{\} instead of \(\)?
	this is not yet resolved (mcilroy&ritchie vs aho&pike&me).
	grouping is an orthogonal issue to subexpressions so why
	use the same parentheses? the latest suggestion (by ritchie)
	is to allow both \(\) and \{\} as grouping operators but
	the \3 would only count one type (say \(\)). this would be much
	better for complicated patterns with much grouping.

21) subroutine versions of the pattern matching stuff.
	in a deep sense, the new grep will have no pattern matching code in it.
	all the pattern matching code will be in libc with a uniform
	interface. the boyer-moore and commentz-walter routines have been
	done. the other two are egrep and back-referencing egrep.
	lastly, regexp will be reimplemented.

22) support a filename of - to mean standard input.
	a unix without /dev/stdin is largely bogus but as a sop to the poor
	barstards having to work on BSD, gre will support -
	as stdin (at least for a while).

Thus, the current proposal is the following flags. it would take a GOOD
argument to change my mind on this list (unless it is to get rid of a flag).

-f file	pattern is (`cat file`)
-v	nonmatching lines are 'selected'
-i	ignore aphabetic case
-n	print line number
-c	print count of selected lines only
-l	print filenames which have a selected line
-L	print filenames who do not have a selected line
-b	print byte offset of line begin
-h	do not print filenames in front of matching lines
-H	always print filenames in front of matching lines
-w	pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9])
-1	print only first selected line per file
-e expr	use expr as the pattern

Andrew Hume
research!andrew

wswietse@eutrc3.UUCP (Wietse Venema) (06/11/88)

In article <7207@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes:
}Grep should, where reasonable, not be bound by the notion of a 'line'.
}As a concrete expression of this, the useful grep -l (prints the names of
}the files that contain the string) should work on any kind of file.  More
}than one existing 'grep -l' will fail, for example, to tell you which of a 
}bunch of .o files contain a given string.  Scenario - you're trying to
}link 55 .o's together to build a program you don't know that well.  You're
}on berklix.  ld sez: "undefined: _memcpy".  You say: "who's doing that?".
}The source is scattered inconveniently.  The obvious thing to do is: 
}grep -l _memcpy *.o
}That this often will not work is irritating.
}Tim Bray, New Oxford English Dictionary Project, U of Waterloo

	nm -op *.o | grep memcpy

will work just fine, both with bsd and att unix.

	Wietse
-- 
uucp:	mcvax!eutrc3!wswietse	| Eindhoven University of Technology
bitnet:	wswietse@heithe5	| Dept. of Mathematics and Computer Science
surf:	tuerc5::wswietse	| Eindhoven, The Netherlands.

randy@umn-cs.cs.umn.edu (Randy Orrison) (06/12/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
|3) print lines with context.
|	the second most requested feature but i'm not doing it. this is
|	just the job for sed. to be consistent, we just took the context
							^^^^^^^^^^^^^^^^
|	crap out of diff too. this is actually reasonable; showing context
	^^^^^^^^^^^^^^^^
|	is the job for a separate tool (pipeline difficulties apart).


What?!?!?   Ok, i would like context in grep, but i'll live without it.
Context diffs, however are a different matter.  There isn't an easy way
to generate them with diff/context (the first character of every line is
produced as part of the diff).  Context diffs are useful for patches, and
having a tool to generate them is necessary.  They're a logical improvement
to diff that is more than just context around the changes.

If you're fixing grep fine, but don't break diff while you're at it.

	-randy
-- 
Randy Orrison, Control Data, Arden Hills, MN		randy@ux.acss.umn.edu
8-(OSF/Mumblix: Just say NO!)-8	    {ihnp4, seismo!rutgers, sun}!umn-cs!randy
	"I consulted all the sages I could find in Yellow Pages,
	but there aren't many of them."			-APP

bd@hpsemc.HP.COM (bob desinger) (06/12/88)

> Along this same general line it would be nice to be abble to
> look for paterns that span lines.

Here's a script called `phrase' from Dougherty and O'Reilly's
_Unix_Text_Processing_ book.  It finds patterns that are possibly
split across lines.  Its usage is:

	phrase "phrase to find" files ...

It doesn't have all those grep options, but at least it gets you
halfway there.

-- bd

#! /bin/sh
# This is a shell archive.  Remove anything before this line,
# then unwrap it by saving it in a file and typing "sh file".
#
# Wrapped by bd at hpsemc on Sat Jun 11 23:26:46 1988
# Contents:
#	phrase 	

PATH=/bin:/usr/bin:/usr/ucb:/usr/local/bin:$PATH; export PATH
echo 'At the end, you should see the message "End of shell archive."'

echo Extracting phrase
cat >phrase <<'@//E*O*F phrase//'
: find phrases, perhaps split across lines
# From _Unix_Text_Processing_ by Dougherty & O'Reilly, p. 378

if [ $# -lt 2 ]
then	echo "Usage:  `basename $0`" '"phrase to find" file ...'
	exit 1
else
	search="$1"	# pattern
	shift
fi

for file
do
	sed '
	/'"$search"'/b
	N
	h
	s/.*\n//
	/'"$search"'/b
	g
	s/ *\n/ /
	/'"$search"'/{
	g
	b
	}
	g
	D' $file
done
@//E*O*F phrase//

set `wc -lwc <phrase`
if test $1 -ne 28 -o $2 -ne 62 -o $3 -ne 355
then	echo ! phrase should have 28 lines, 62 words, and 355 characters
	echo ! but has $1 lines, $2 words, and $3 characters
fi
chmod 775 phrase

echo "End of shell archive."
exit 0

wesommer@athena.mit.edu (William Sommerfeld) (06/12/88)

In article <144@insyte.UUCP> jad@insyte.UUCP writes:
>A missing feature in UNIX is the ability to deal with files with
>very long lines - the kind of file you get from a data base tape like
>Compustat.  The standard data base tape contains very long lines.
>Instead of separating each record with a newline, all the records may
>be on the same line.  There is a defined record size which is used to
>determine when a record ends - instead of a newline.
>
>There are two specific things that it would be nice to do with UNIX, 
>instead of having to write a c program:

As usual with UNIX, you _don't_ have to write a C program.  Use `dd'
instead.  If we're talking about the canonical IBM "80 column card
image", then the following should work just fine to convert it to a
"normal" file:

dd conv=unblock cbs=80 <in >out

Adding conv=ascii will also convert EBCDIC into ascii.

>2) To be able to specify a field range (ie columns 20-30) for the 
>program to search - instead of the entire line/record.  

grep '^........foo' 

picks up any line which has `foo' in columns 9-11..

					- Bill

chris@mimsy.UUCP (Chris Torek) (06/12/88)

In article <144@insyte.UUCP> jad@insyte.UUCP (Jill Diewald) writes:
-... data base tape contains very long lines, instead of separating
-each record with a newline, all the records may be on the same line.
-There is a defined record size which is used to determine when a
-record ends - instead of a newline.

Think `tools'.

-1) To be able to give grep (also awk, etc) a record size 
-which it would use instead of newlines.  

Use `dd | grep'.

-2) To be able to specify a field range (ie columns 20-30) for the 
-program to search - instead of the entire line/record.  In addition it 
-should be posible to specify several fields in one grep.

Use `dd | cut | grep'.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

roy@phri.UUCP (Roy Smith) (06/12/88)

jad@insyte.UUCP writes:
> A missing feature in UNIX is the ability to deal with files with very
> long lines.

	Unless I'm misunderstanding jad, he's talking about fixed length
records.  Can't you just do: "dd conv=unblock cbs=80 (or whatever)" to
convert the file to standard Unix \n-terminated lines?  Hasn't this been
part of Unix since at least v6?

	Also, I agree with whoever said that taking context diffs out of
diff is a bad idea.  Context diffs is what makes patch work so well, and
patch is what makes the world go 'round.  Please don't take my context
diffs away-ay-ay.. (with apolgies to Paul Simon).
-- 
Roy Smith, System Administrator
Public Health Research Institute
455 First Avenue, New York, NY 10016
{allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net

allbery@ncoast.UUCP (Brandon S. Allbery) (06/13/88)

As quoted from <7944@alice.UUCP> by andrew@alice.UUCP:
+---------------
| 	the right thing to do is to write a context program that takes
| input looking like "filename:linenumber:goo" and prints whatever context you like.
| we can then take this crap out of grep and diff and make it generally available
| for use with programs like the C compiler and eqn and so on. It can also do
| the right thing with folding together nearby lines. At least one good first
| cut has been put on the net but a C program sounds easy enough to do.
+---------------

A C version has been done; it handles pcc, grep -n, and cpp messages.  I
posted it 2 1/2 years ago.

It does *not* handle diff, since diff's messages are slightly different and
lack filename information; also, since it passes lines it doesn't understand
you'd end up with both regular and context diffs in the same output.  Now if
diff had an option to output in the format

		<filename>:<lineno>[-<lineno>]:<action>

we'd be all set -- I could modify it to handle ranges easily.  (Changes
would be output as "file1:n-m:file was\nfile2:n-m:now is", or something
similar.)

Note that it'd be nice if lint output messages this way as well.  I have a
postprocessor for lint which does this -- even with System V's lint that
can have lint1 and lint2 run separately via .ln files.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/13/88)

In article <3350@phri.UUCP> roy@phri.UUCP (Roy Smith) writes:
>Unless I'm misunderstanding jad, he's talking about fixed length records.

She was.

The important point is that UNIX text (line-oriented) tools typically
break miserably when lines containing more than 256 or 512 (sometimes
more) characters are encountered.  In many cases this is not a
necessary feature but was due to quick-and-dirty implementation.
It IS more work to read in an arbitrarily long line, but once you
write your getline() function you could add it to the local library
and then it would be easy to do in the future.

I seem to recall that Lindemann fixed this problem in "sort" for SVR2.

>Also, I agree with whoever said that taking context diffs out of
>diff is a bad idea.

Removing the ability to get context diffs when they are wanted WOULD
be a bad idea.  Removing this feature from "diff" itself is not a
bad idea; I hate for "diff" to do extra work every time I run it when
I virtually never use the context feature.  Consider
	diff a b | diffc a b
where "diffc" reads the "diff" information in parallel with the two
files "a" and "b" to produce the context-diff output.  By separating
the two functions, it is not only likely to speed up non-context use
of "diff" but also it is more likely to get the answer right, and it
is easier to work on improving "diffc".  (Existing context diff output
is sometimes pretty horrible, for example larger than the inputs.)

rbj@cmr.icst.nbs.gov (Root Boy Jim) (06/13/88)

? From: J Greely <jgreely@dimetrodon.cis.ohio-state.edu>

? In article <1998@u1100a.UUCP> krohn@u1100a.UUCP (Eric Krohn) writes:
? >To put in a plug for Larry Wall's perl language (Release 2.0 due soon at a
? >comp.sources.unix near you):

? >[suggests the following script for grep-first-only]
? >perl -n -e 'if(/Subject/){print $ARGV,":",$_;close(ARGV);}' * >/dev/null
? 
? This works, and is indeed faster.  However, it shares one problem with
? all of the others: '*' expansion.  As an (uncomfortable) example,
? /usr/spool/news/talk/bizarre has over 2500 articles in it at our site,
? and the shell can't expand that properly (SunOS 3.4, if it matters).
? So, the following perl script accomplishes the same thing, no matter
? how many files need searched:

[replacement solution deleted]

Don't forget about xargs.

?        (jgreely@cis.ohio-state.edu; ...!att!cis.ohio-state.edu!jgreely)
? 		  Team Wheaties says: "Just say NO to rexd!"
? 	       /^Newsgroups: .*\,.*\,.*\,/h:j   /[Ww]ebber/h:j
? 	       /[Bb]irthright [Pp]arty/j        /[Pp]ortal/h:j
 
	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

jgreely@tut.cis.ohio-state.edu (J Greely) (06/14/88)

In article <16148@brl-adm.ARPA> rbj@cmr.icst.nbs.gov (Root Boy Jim) writes:
>? From: J Greely <jgreely@dimetrodon.cis.ohio-state.edu>
>[replacement solution deleted]

>Don't forget about xargs.

Don't forget about non-SYSV sites!  (There is a simple replacement for
xargs in comp.sources.unix Volume 3, but not everyone has this).
-- 
       (jgreely@cis.ohio-state.edu; ...!att!cis.ohio-state.edu!jgreely)
		  Team Wheaties says: "Just say NO to rexd!"
	       /^Newsgroups: .*\,.*\,.*\,/h:j   /[Ww]ebber/h:j
	       /[Bb]irthright [Pp]arty/j        /[Pp]ortal/h:j

keith@seismo.CSS.GOV (Keith Bostic) (06/14/88)

In article <7962@alice.UUCP>, andrew@alice.UUCP writes:

> 22) support a filename of - to mean standard input.
> 	a unix without /dev/stdin is largely bogus but as a sop to the poor
> 	barstards having to work on BSD, gre will support -
> 	as stdin (at least for a while).
>
> Andrew Hume
> research!andrew

A few comments:

     -- As far I'm aware, V9 is the only system that has "/dev/stdin" at the
	moment.  For those who haven't heard of it, V9 is a research version
	of UN*X developed and in use at the Computing Science Research Center,
	a part of AT&T Bell Laboratories, and available to a small number of
	universities.  It was preceded by V8, which, interestingly enough, was
	built on top of 4.1BSD.

     -- System V does not suppport "/dev/stdin".

     -- The next full release of BSD will contain "/dev/stdin" and friends.
	It is not part of the 4.3-tahoe release because it requires changes
	to stdio.  I do not expect, however, commands that currently support
	the "-" syntax to change, for compatibility reasons.  V9 itself
	continues to support such commands.

To sum up, let's try and keep this, if not actually constructive, at least
bearing some distant relationship to the facts.

Keith Bostic

guy@gorodish.Sun.COM (Guy Harris) (06/14/88)

> ? This works, and is indeed faster.  However, it shares one problem with
> ? all of the others: '*' expansion.  As an (uncomfortable) example,
> ? /usr/spool/news/talk/bizarre has over 2500 articles in it at our site,
> ? and the shell can't expand that properly (SunOS 3.4, if it matters).

"Fixed in 4.0", perhaps: from 4.0's "sys/param.h":

#define	NCARGS	0x100000	/* (absolute) max # characters in exec arglist */

(I don't know which versions of the shell can cope with 1MB argument lists, if
any.)

> Don't forget about xargs.

Assuming, of course, that your system has it; SunOS has it in releases 3.2 or
later (if you install the "System V Optional Software"; it's in "/usr/bin"),
but vanilla 4.xBSD doesn't, for example.  I would not be at all surprised to
hear that there is some public-domain reimplementation out there somewhere.

andrew@alice.UUCP (06/14/88)

actually our version of the linderman sort doesn't do arbitrary-sized lines;
it does the REASONABLE thing of complaining when the lines are too long
rather than silently progressing (or looping or truncating or ...).
that is why gre will either work propoerly (handle any line length)
or settle on a (sensible) max length and complain if it gets an overflow.

allbery@ncoast.UUCP (Brandon S. Allbery) (06/14/88)

As quoted from <5007@sdcsvax.UCSD.EDU> by hutch@net1.ucsd.edu (Jim Hutchison):
+---------------
| 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett)
| >In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes:
| >|> There have been times when I wanted a grep that would print out the
| >|> first occurrence and then stop.
| >|
| >|grep '(your_pattern_here)' | head -1
| >
| >There are times when I want the first occurrence of a pattern without
| >reading the entire (i.e. HUGE) file.
| 
| I realize this is dependent on the way in which processes sharing a
| pipe act, but this is a point worth considering before we get yet
| another annoying burst of "cat -v" type programs.
| 
| grep pattern file1 ... fileN | head -1
| 
| This should send grep a SIGPIPE as soon as the first line of output
| trickles through the pipe.  This would result in relatively little
| of the file actually being read under most Unix implementations.
+---------------

Not true.  The SIGPIPE is sent when "grep" writes the second line, *not*
when "head" exits!  If there *is* only one line containing the pattern, grep
will happily read all of the (possibly large) files without getting SIGPIPE.
This is not pleasant, even if it's only one large file -- say a
comp.sources.unix posting which you're grepping for a Subject: line.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

andrew@frip.gwd.tek.com (Andrew Klossner) (06/14/88)

[]

	"so far, gre will have only one limit, a line length of 64K.
	(NO, i am not supporting arbitrary length lines (yet)!)"

Why not a flag to let the user specify the max line length?  Just the
thing for that database hacker, and diminishes the demand for arbitrary
length.

	"there will be a -G flag to take patterns a la old grep and a
	-F to take patterns a la fgrep"

I hope that -F is a permanent, not temporary, flag.  I don't see it in
the summary list of supported flags, shudder.

	"a unix without /dev/stdin is largely bogus but as a sop to the
	poor barstards having to work on BSD, gre will support - as
	stdin (at least for a while)."

It's not just BSD; I haven't seen /dev/stdin in any released edition.
I just looked over the sVr3.1 tape and didn't turn up anything.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

chris@mimsy.UUCP (Chris Torek) (06/14/88)

In article <44370@beno.seismo.CSS.GOV> keith@seismo.CSS.GOV
[at seismo?!?] (Keith Bostic) writes:
>    -- The next full release of BSD will contain "/dev/stdin" and friends.
>	It is not part of the 4.3-tahoe release because it requires changes
>	to stdio.

Well, only because

	freopen("/dev/stdin", "r", stdin)

unexpectedly fails: it closes fd 0 before attempting to open /dev/stdin,
which means that stdin is gone before it can grab it again.  When I
`fixed' this here it broke /usr/ucb/head and I had to fix the fix!

The sequence needed is messy:

	old = fileno(fp);
	new = open(...);
	if (new < 0) {
		close(old);	/* maybe it was EMFILE */
		new = open(...);/* (could test errno too) */
		if (new < 0)
			return error;
	}
	if (new != old) {
		if (dup2(new, old) >= 0)	/* move it back */
			close(new);
		else {
			close(old);
			fileno(fp) = new;
		}
	}

Not using dup2 means that freopen(stderr) might make fileno(stderr)
something other than 2, which breaks at least perror().
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163)
Domain:	chris@mimsy.umd.edu	Path:	uunet!mimsy!chris

ok@quintus.uucp (Richard A. O'Keefe) (06/14/88)

In article <8080@brl-smoke.ARPA> gwyn@brl.arpa (Doug Gwyn (VLD/VMB) <gwyn>) writes:
>Removing the ability to get context diffs when they are wanted WOULD
>be a bad idea.  Removing this feature from "diff" itself is not a
>bad idea; I hate for "diff" to do extra work every time I run it when
>I virtually never use the context feature.  Consider
>	diff a b | diffc a b
>where "diffc" reads the "diff" information in parallel with the two
>files "a" and "b" to produce the context-diff output.

About half of my calls to "diff" feed it with a pipe, e.g.
	NewProgramVersion <Data | diff ..Options.. - ExpectedOutput
I don't know how diff handles this, and I don't care; that's diff's job.
If you split it into two programs, someone who wants a context difference
for regression testing has to figure out how to handle pipes (him|her)self.
There is not the least fragment of a shadow of a reason for the -c option
to slow "diff" down in the cases when it is not used.  As one method of
implementation, consider a stripped down diff which exec()s another program
(/lib/diffc, perhaps) when it sees the -c option.

Berkeley may have gone overboard in adding new flags, but at least adding
new flags doesn't break working code.  If I have to rewrite scripts which
use only features documented in the V.2 SVID because someone decided that
his ideas of elegance were more important than my labour, I will be very
upset.

tbray@watsol.waterloo.edu (Tim Bray) (06/14/88)

>In article <7207@watdragon.waterloo.edu> I wrote:
>}Grep should, where reasonable, not be bound by the notion of a 'line'.
...
>}The source is scattered inconveniently.  The obvious thing to do is: 
>}grep -l _memcpy *.o
>}That this often will not work is irritating.

At least a dozen people have sent me alternate ways of doing this, the 
most obvious using 'nm'.  Look, I KNOW ABOUT NM! But you're missing the 
point - suppose the item in the .o files was another type of string, e.g.
an error message.  

The point is:  There are some files.  One or more may contain a string in
which I am interested.  grep -l is a tool which is supposed to tell me whether
one or more files contain a string.  The fact that it refuses to do so for 
a class of magic files is a gratuitous violation of the unix paradigm.
Tim Bray, New Oxford English Dictionary Project, U of Waterloo

oz@yunexus.UUCP (Ozan Yigit) (06/15/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
>
>21) subroutine versions of the pattern matching stuff.
>	....
>	.... the other two are egrep and back-referencing egrep.
>	lastly, regexp will be reimplemented.
>
>Andrew Hume

Just how do you propose to implement the back-referencing trick in 
a properly constructed (nfa and/or nfa->dfa conversion static or
on-the-fly) egrep ?? I presume that after each match of the
\(reference\) portion, you would have to on-the-fly modify the \n
portion of the fsa. Gack! Do you have a theoretically solid algorithm
[say, within the context of Aho/Sethi/Ullman's Dragon Book chapter on
regular expressions] for this ??  I would be much interested.

oz
-- 
The DeathStar rotated slowly,	      |  Usenet: ...!utzoo!yunexus!oz
towards its target, and sparked       |  ....!uunet!mnetor!yunexus!oz
an intense SUNbeam. The green world   |  Bitnet: oz@[yulibra|yuyetti]
of unics evaporated instantly...      |  Phonet: +1 416 736-5257x3976

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (06/15/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
|
|	The following is a summary of the somewhat plausible ideas
|suggested for the new grep. 

|4) print one(first matching) line and go onto the next file.
|	most of the justification for this seemed to be scanning
|	mail and/or netnews articles for the subject line; neither
|	of which gets any sympathy from me. but it is easy to do
|	and doesn't add an option; we add a new option (say -1)
|	and remove -s. -1 is just like -s except it prints the matching line.
|	then the old grep -s pattern is now grep -1 pattern > /dev/null
|	and within epsilon of being as efficent.
	                            -----------
Actually this is extremely wrong.

Given the command 
	grep -1 Subject /usr/spool/news/comp/sources/unix/* >/dev/null
and
	grep -s Subject /usr/spool/news/comp/sources/unix/* >/dev/null

I would expect the first one to read *every* file. 

The second case ( -s ) should terminate as soon as it finds the first
match in the first file.

Unless I misunderstand the functionality of the -s command.
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

rbj@ICST-CMR.ARPA (Root Boy Jim) (06/15/88)

? From: Randy Orrison <randy@umn-cs.cs.umn.edu>

? In article <7962@alice.UUCP> andrew@alice.UUCP writes:
? |3) print lines with context.
? |	the second most requested feature but i'm not doing it. this is
? |	just the job for sed. to be consistent, we just took the context
? 							^^^^^^^^^^^^^^^^
? |	crap out of diff too. this is actually reasonable; showing context
? 	^^^^^^^^^^^^^^^^
? |	is the job for a separate tool (pipeline difficulties apart).
? 
? 
? What?!?!?   Ok, i would like context in grep, but i'll live without it.
? Context diffs, however are a different matter.  There isn't an easy way
? to generate them with diff/context (the first character of every line is
? produced as part of the diff).  Context diffs are useful for patches, and
? having a tool to generate them is necessary.  They're a logical improvement
? to diff that is more than just context around the changes.
? 
? If you're fixing grep fine, but don't break diff while you're at it.

Ditto. In this day and age, it is unthinkable to generate diffs by
hand.  It is equally unthinkable to apply diffs (patches) by hand as
well. With the inclusion of the fudge factor in patch, context diffs
take on new value. Distrubuting non-context diffs in a source group
should be considered a felony. Context diffs are a feature that have
been proven useful time and time again.

I find it unacceptable to reread a file twice to do what I could do in
one pass. Thus, Doug Gwyn's suggestion of a separate diffc program is
unacceptable as well.

I too can live without context greps; perhaps sed is an answer, altho
it currently works only on one file (multiple files are catenated).
Perhaps awk could use a `nextfile' command and we'd all be happy?).

You are carrying this `tools' approach too far. Gone are the days of
small sizes; few people run on a PDP-11 anymore. Memory and disk space
are cheap these days; the goal is no longer to reduce each program to
it's minimalist set of options and execution size. Composing tools is
as conceptually intimidating to the user as choosing the right option
in the first place.  Often, the tools *don't* compose correctly, and
functions must be accreted into tools that `logically' could be
handled elsewhere, such as ls -C. Provide what the user needs in a
concise form, without having to compose an arcane list of pipelines.
Trade size of executables for execution speed where appropriate.
Unused code is never paged in anyway.
 
? 	-randy
? -- 
? Randy Orrison, Control Data, Arden Hills, MN		randy@ux.acss.umn.edu
? 8-(OSF/Mumblix: Just say NO!)-8	    {ihnp4, seismo!rutgers, sun}!umn-cs!randy
? 	"I consulted all the sages I could find in Yellow Pages,
? 	but there aren't many of them."			-APP

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

rbj@cmr.icst.nbs.gov (Root Boy Jim) (06/15/88)

? From: andrew@alice.uucp
? 
? 4) print one(first matching) line and go onto the next file.
? 	most of the justification for this seemed to be scanning
? 	mail and/or netnews articles for the subject line; neither
? 	of which gets any sympathy from me. but it is easy to do
? 	and doesn't add an option; we add a new option (say -1)
? 	and remove -s. -1 is just like -s except it prints the matching line.
? 	then the old grep -s pattern is now grep -1 pattern > /dev/null
? 	and within epsilon of being as efficent.

I often grep for a host name in /etc/hosts. This is a big file and
would benefit from the execution time saved. Yeah, I know, use sed,
it's only one file. OK, how about this: grep -1 '#include .thing.' *.c?
 
? 5) divert matching lines onto one fd, nonmatching onto another.
? 	sorry, run grep twice.

While I rarely want to do this, the times I have, I have been extremely
annoyed. Why should I have to suffer twice the execution time when it
is trivial to put this in?
 
? Thus, the current proposal is the following flags. it would take a GOOD
? argument to change my mind on this list (unless it is to get rid of a flag).

? -h	do not print filenames in front of matching lines
? -H	always print filenames in front of matching lines

It has already been shown how to do these: for the former, use
cat files | grep, for the latter, grep files /dev/null. Perhaps
you are being a tad inconsistant with the tools philosophy?

? -e expr	use expr as the pattern

What about the magic `--' getopt token? Do we need `-e'?

? Andrew Hume
? research!andrew
 
	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	My name is in /usr/dict/words. Is yours?

guy@gorodish.Sun.COM (Guy Harris) (06/16/88)

> grep -l is a tool which is supposed to tell me whether one or more files
> contain a string.

No, it isn't.  "grep -l" is a tool that is supposed to tell you whether one or
more *text* files contain a string; if your file doesn't happen to contain
newlines at least every N characters or so, too bad.  If you want to improve
this situation by writing a "grep" that doesn't have this restriction, feel
free.

> The fact that it refuses to do so for a class of magic files is a
> gratuitous violation of the unix paradigm.

"ed is a tool that is supposed to let me modify files.  The fact that it
refuses to do so for a class of magic files is a gratuitous violation of the
unix paradigm."  Sorry, but the fact that you can't normally use "ed" to patch
binaries doesn't bother me one bit.

ljz@fxgrp.UUCP (Lloyd Zusman) (06/16/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
  
  	The following is a summary of the somewhat plausible ideas
  suggested for the new grep.  ...

  ...

  2) matching multi-line patterns (\n as part of pattern)
  	this actually requires a lot of infrastructure support and thought.
  	i prefer to leave that to other more powerful programs such as sam.
                                                                       ^^^
  ...

Since I'm one of the people who suggested the ability to match multi-line
patterns, I'm a bit disappointed about this ... but such is life.  So
where can I find 'sam'?  Is it in the public domain?  Is source code
available?

You can try to reply via email ... it might actually work, but don't
be surprised if your mail bounces, in which case I'd appreciate
replies here.

Thanks in advance.

--
  Lloyd Zusman                          UUCP:   ...!ames!fxgrp!ljz
  Master Byte Software              Internet:   ljz%fx.com@ames.arc.nasa.gov
  Los Gatos, California               or try:   fxgrp!ljz@ames.arc.nasa.gov
  "We take things well in hand."

andrew@alice.UUCP (06/16/88)

In article <515@yunexus.UUCP>, oz@yunexus.UUCP writes:
> Just how do you propose to implement the back-referencing trick in 
> a properly constructed (nfa and/or nfa->dfa conversion static or
> on-the-fly) egrep ?? I presume that after each match of the
> \(reference\) portion, you would have to on-the-fly modify the \n
> portion of the fsa. Gack! Do you have a theoretically solid algorithm
> [say, within the context of Aho/Sethi/Ullman's Dragon Book chapter on
> regular expressions] for this ??  I would be much interested.

theoretically solid is not what i would call it but the algorithm is simple
enough once you have a subroutine for egrep that matches a pattern against
an input with a match of at least n input chars. you just do what you have to
do: an exponential back-tracking algorithm. thus, back-referencing is not done
inside the fsa, but as part of a (complicated0 control function. I realise
this sounds vague but i can't give you the details until i do it. al aho has
done it and probably understands this stuff as well as anyone in the world.

andrew@alice.UUCP (06/16/88)

i am not proposing that the world uses a diff without context;
just our world. it is rarely used in our center and we don't use patch.
and despite large address spaces and huge machines, we still believe
in trying to eliminate crud that is essentially never used. crud that is
not paged in is still crud. just remember, i am not trying to make you use
our (contextless) diff.

the point about contexts is that it is something you can do in many different
places with the output from many commands. this is what suggests that it
be a separate tool. it doesn't have to subsume all context tasks;
perhaps diff output just doesn't fit the mold. and complaints about
greps of standard input indicate that you need to think about whether
the context tool can handle pipe inputs and if it can't be done, then
context greps of standrd input don't fit the mold either.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/16/88)

In article <698@fxgrp.UUCP> ljz%fx.com@ames.arc.nasa.gov (Lloyd Zusman) writes:
>where can I find 'sam'?  Is it in the public domain?  Is source code
>available?

So far as I know, if you aren't part of AT&T and don't have 9th Edition UNIX,
the only way to legally obtain "sam" is to acquire it from the AT&T UNIX
System ToolChest, where it is included in the "dmd-pgmg" package.  This is
definitely not public domain, but it's inexpensively priced and it does
include source code.

"sam" works either with dumb terminals or with a smart one like an AT&T
Teletype 5620 or 630.  I haven't tried installing it without DMD support
but obviously it can be done.

I use "sam" (DMD version) whenever I have serious editing to do.

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/16/88)

In article <16173@brl-adm.ARPA> rbj@ICST-CMR.ARPA (Root Boy Jim) writes:
>Distrubuting non-context diffs in a source group
>should be considered a felony. Context diffs are a feature that have
>been proven useful time and time again.

I have to disagree with the sentiment that "diff -c" is extremely
useful.  I find it only slightly useful.

You might have noticed that when I post bug fixes I never do it
via "diff -c".  I prefer to give enough information to RELIABLY
patch the code.  In any context where I would trust "patch", I
would also trust "ed" using the output of "diff -e", which is
generally much less output.  (By the way, this could also be
done with a separate filter applied to normal "diff" output.)

I recently generated a "diff -c -b" comparison between SVR2 sh
sources and the BRL version of sh.  The output was larger than
the concatenation of all the sources.  It was useful for the
intended purpose (browsing), but would be ludicrous for "patch"ing.

fmr@cwi.nl (Frank Rahmani) (06/16/88)

> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792
> Posted: Fri Jun 10 05:29:43 1988
> 
> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> A real useful `tool', this, that works only on files.  And only when
> you grep more than one file, so you get filenames (or happen to be able
> to remember which flag it is to make grep print filenames always,
> assuming of course that your grep has it).
...
...
that's the smallest of all problems, just include /dev/null as first
file to be searched
into your script like
grep [options] pattern /dev/null one_or_more_filenames
by the way I like the sed one-liner that
was posted as answer to the grep replacement
question. Why couldn't I think of it?:-)
fmr@cwi.nl
-- 
It is better never to have been born. But who among us has such luck?
--------------------------------------------------------------------------
These opinions are solely mine and in no way reflect those of my employer.  

rbj@cmr.icst.nbs.gov (Root Boy Jim) (06/16/88)

? From: andrew@alice.uucp

? 5) divert matching lines onto one fd, nonmatching onto another.
? 	sorry, run grep twice.

I can imagine Dennis Ritchie, designing the C language, saying:

5) <lvalue> <op>= <expression>
        sorry, type lvalue twice.

? Andrew Hume

Sorry, I just couldn't resist taking another swipe.

	(Root Boy) Jim Cottrell	<rbj@icst-cmr.arpa>
	National Bureau of Standards
	Flamer's Hotline: (301) 975-5688
	The opinions expressed are solely my own
	and do not reflect NBS policy or agreement
	Careful with that VAX Eugene!

guy@gorodish.Sun.COM (Guy Harris) (06/17/88)

> In any context where I would trust "patch", I would also trust "ed" using
> the output of "diff -e", which is generally much less output.

In many contexts where I would trust "patch" with a context "diff", I would
*NOT* trust "ed" with a "diff -e" any further than I could throw it.

"diff -e" scripts contain line numbers that *must* match the lines in the file
being patched, at least if you're using "ed" - "patch" may be able to figure
out the right line numbers if you're not patching the exact same version of the
source, although I would not be surprised if it didn't, since "diff -e" scripts
don't have the context that makes this easier.

"diff -c" scripts contain the aforementioned context, so that they can be used
to apply patches to source that is *not* identical to the source from which the
"diff"s were made.  This is quite important in many cases.  (I use "diff -c"
and "patch" to merge different streams of changes to a source file, for
example.)

> I recently generated a "diff -c -b" comparison between SVR2 sh
> sources and the BRL version of sh.  The output was larger than
> the concatenation of all the sources.  It was useful for the
> intended purpose (browsing), but would be ludicrous for "patch"ing.

Yes, you can construct examples where "diff -c" output is too big to be
practical.  However, the vast majority of the "diff -c" patches I've seen
distributed are not that big; the context is a big win.

andrew@alice.UUCP (06/17/88)

sam is describe in software practise and experience, nov 1987.
it is available from the at&t toolchest as part of the package
'dmdprograms' (or similar) for something like $125. this includes
source. to run on something other than a teletype 5620, say a sun,
you have to rewrite a little but it is worth it.

daveb@geac.UUCP (David Collier-Brown) (06/17/88)

In article <10078@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew
Klossner) quotes someone to say:
>[]
>
>	"so far, gre will have only one limit, a line length of 64K.
>	(NO, i am not supporting arbitrary length lines (yet)!)"

   Well, arbitrary line lengths are easy.

  Initially
	allocate a cache
  When reading
	fgets a cache-full
	if the last character is not a \n
		increase the cache with realloc
		read some more


  A function to do this, called getline, was published recently in
the source groups.

--dave (remember my old .signature?) c-b
-- 
 David Collier-Brown.  {mnetor yunexus utgpu}!geac!daveb
 Geac Computers Ltd.,  | "His Majesty made you a major 
 350 Steelcase Road,   |  because he believed you would 
 Markham, Ontario.     |  know when not to obey his orders"

wolfe@pdnbah.uucp (Mike Wolfe) (06/17/88)

In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes:
>> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792
>> Posted: Fri Jun 10 05:29:43 1988
>> 
>> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
>> A real useful `tool', this, that works only on files.  And only when
>> you grep more than one file, so you get filenames (or happen to be able
>> to remember which flag it is to make grep print filenames always,
>> assuming of course that your grep has it).
>...
>...
>that's the smallest of all problems, just include /dev/null as first
>file to be searched
>into your script like
>grep [options] pattern /dev/null one_or_more_filenames

Smallest of all problems? One of my pet peeves is the fact that certain
commands will only print filenames if you give it more than one file. While
the /dev/null ugliness is a suitable kludge for the grep case what about
a case were you want to run something using xargs, something like sum. You
don't want /dev/null repeated for each call. I know I can sed it out but
that's just a kludge for a kludge and to me that's a red flag.

I think that all commands of that type should allow you to force the filenames
in output. I don't want to go back and change all the commands (UNIX++ a
modest proposal ;-). I just wish people would keep this in mind when writing
things in the future.

----
Mike Wolfe
Paradyne Corporation,  Mail stop LF-207   DOMAIN   wolfe@pdn.UUCP
PO Box 2826, 8550 Ulmerton Road           UUCP     ...!uunet!pdn!wolfe
Largo, FL  34649-2826                     PHONE    (813) 530-8361

karish@denali.stanford.edu (Chuck Karish) (06/18/88)

In article <7990@alice.UUCP> andrew@alice.UUCP writes:

>i am not proposing that the world uses a diff without context;
>just our world. it is rarely used in our center and we don't use patch.
>and despite large address spaces and huge machines, we still believe
>in trying to eliminate crud that is essentially never used. crud that is
>not paged in is still crud. just remember, i am not trying to make you use
>our (contextless) diff.

Oh.

From the tone of your previous postings, and because of some of the names
you dropped in them, I was under the impression that you were writing
a full-function replacement for grep.  That assumption seems to have carried
over to the discussion of diff.
If you want to hack at important utilities, go ahead.  Just keep them
at your site, or call them by a distinctive name.  You've seen a preview
of what the response will be if diff is shipped out in the form you describe.


Chuck Karish	ARPA:	karish@denali.stanford.edu
		BITNET:	karish%denali@forsythe.stanford.edu
		UUCP:	{decvax,hplabs!hpda}!mindcrf!karish
		USPS:	1825 California St. #5   Mountain View, CA 94041

davidsen@steinmetz.ge.com (William E. Davidsen Jr) (06/18/88)

If you are able to use "diff -e" you mush have a diferrent version than
I do... The one I have generates refs to absolute line numbers, and is
useless for applying any patches if the source has been modified, even
in other parts of the program.
-- 
	bill davidsen		(wedu@ge-crd.arpa)
  {uunet | philabs | seismo}!steinmetz!crdos1!davidsen
"Stupidity, like virtue, is its own reward" -me

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/18/88)

In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes:
>> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:

But I didn't.  (I think it was BZS.)  PLEASE, check your attributions!

maart@cs.vu.nl (Maarten Litmaath) (06/18/88)

In article <7962@alice.UUCP> andrew@alice.UUCP writes:
\...
\5) divert matching lines onto one fd, nonmatching onto another.
\	sorry, run grep twice.

Come on! The diversion is no problem at all to implement, and it can be very
useful (you cannot run grep twice on stdin, without use of temporary files).
Regards.
-- 
South-Africa:                         |Maarten Litmaath @ Free U Amsterdam:
           revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart

ado@elsie.UUCP (Arthur David Olson) (06/18/88)

The "new" grep's -b option provides everything necessary to do "efficient"
post-processor-based context display (file offsets rather than line numbers).

Since there's a proposed change in the semantics of "-b", I've suggested
changing its name (to "-B" or "-z" or whatever) to avoid quiet surprises from
existing scripts.
-- 
		Grocery swaps ends for Chinese native.  (5)
	ado@ncifcrf.gov			ADO is a trademark of Ampex.

henry@utzoo.uucp (Henry Spencer) (06/19/88)

> ... few people run on a PDP-11 anymore. Memory and disk space
> are cheap these days; the goal is no longer to reduce each program to
> it's minimalist set of options and execution size...

You'd be surprised how much of a difference in performance you can get,
even on modern systems, by applying the minimalist philosophy vigorously.

> Trade size of executables for execution speed where appropriate.
> Unused code is never paged in anyway.

Spoken like a true disk salesman. :-) :-(
-- 
Man is the best computer we can      |  Henry Spencer @ U of Toronto Zoology
put aboard a spacecraft. --Von Braun | {ihnp4,decvax,uunet!mnetor}!utzoo!henry

allbery@ncoast.UUCP (Brandon S. Allbery) (06/19/88)

As quoted from <5826@umn-cs.cs.umn.edu> by randy@umn-cs.cs.umn.edu (Randy Orrison):
+---------------
| In article <7962@alice.UUCP> andrew@alice.UUCP writes:
| |3) print lines with context.
| |	the second most requested feature but i'm not doing it. this is
| |	just the job for sed. to be consistent, we just took the context
| 							^^^^^^^^^^^^^^^^
| |	crap out of diff too. this is actually reasonable; showing context
| 	^^^^^^^^^^^^^^^^
| |	is the job for a separate tool (pipeline difficulties apart).
| 
| 
| What?!?!?   Ok, i would like context in grep, but i'll live without it.
| Context diffs, however are a different matter.  There isn't an easy way
| to generate them with diff/context (the first character of every line is
| produced as part of the diff).  Context diffs are useful for patches, and
+---------------

Yes, there is; change diff's output format slightly and expand "context"
slightly, then other programs can also output in "extended context" format so
as to use "context"'s facilities.  I've already described part of this
change in another posting; the other part would be to recognize a special
indicator (on the line number, perhaps?) which would for generality be the
flag to use on the difference, defaulting to "*" which is what "context"
currently uses, or diff could specify "+", "-", or "!".  The only other
change would be to smarten "context" so that it "collapses" context
"windows" together much like the 4.3BSD diff -c does.

It appears that Bell Labs continues to use tools unrepentantly.  It should
be noted that they *are* into research, so I have no arguments against their
use of /dev/stdin (/dev/fd/0?), their assumption that there's plenty of
space so stash away a copy of a file with "tee" for later use in "context",
etc.  (My /dev/stdin complaint earlier was not aimed at the Bell Labs folks,
it was aimed at the person who informed the entire Usenet that "hey, I
posted a /dev/stdin driver source for 4.2BSD, so not a one of you has any
reason not to be running it".  In other words, the usual 4.xBSD-source
elitism.)
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

allbery@ncoast.UUCP (Brandon S. Allbery) (06/19/88)

As quoted from <3350@phri.UUCP> by roy@phri.UUCP (Roy Smith):
+---------------
| jad@insyte.UUCP writes:
| > A missing feature in UNIX is the ability to deal with files with very
| > long lines.
| 
| 	Unless I'm misunderstanding jad, he's talking about fixed length
| records.  Can't you just do: "dd conv=unblock cbs=80 (or whatever)" to
| convert the file to standard Unix \n-terminated lines?  Hasn't this been
| part of Unix since at least v6?
+---------------

Apparently not:  neither System III nor System V r3.1 supports it.  (I used
"strings" on both systems, to make sure it wasn't merely undocumented).  I
certainly think it *should* be in "dd"... it's a rather obvious tape-
conversion operation.
-- 
Brandon S. Allbery			  | "Given its constituency, the only
uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about
Delphi: ALLBERY	       MCI Mail: BALLBERY | [the Open Software Foundation] is
comp.sources.misc: ncoast!sources-misc    | its mouth."  --John Gilmore

frei@rubmez.UUCP (Matthias Frei ) (06/20/88)

In article <7962@alice.UUCP>, andrew@alice.UUCP writes:
> 
> 	The following is a summary of the somewhat plausible ideas

You are backbiting nearly all of the good ideas, posted by
many users at the Net.
So, why did you post your questionable request, if you only
want to do some minor changes to grep ???
Please don't waste our time with things like that.

    Matthias Frei
--------------------------------------------------------------------
Snail-mail:                    |  E-Mail address:
Microelectronics Center        |                 UUCP  frei@rubmez.uucp        
University of Bochum           |                (...uunet!unido!rubmez!frei)
4630 Bochum 1, P.O.-Box 102143 |
West Germany                   |

leo@philmds.UUCP (Leo de Wit) (06/21/88)

In article <16174@brl-adm.ARPA> rbj@cmr.icst.nbs.gov (Root Boy Jim) writes:
== From: andrew@alice.uucp
== 
== 4) print one(first matching) line and go onto the next file.
== 	most of the justification for this seemed to be scanning
== 	mail and/or netnews articles for the subject line; neither
== 	of which gets any sympathy from me. but it is easy to do
== 	and doesn't add an option; we add a new option (say -1)
== 	and remove -s. -1 is just like -s except it prints the matching line.
== 	then the old grep -s pattern is now grep -1 pattern > /dev/null
== 	and within epsilon of being as efficent.
==
=I often grep for a host name in /etc/hosts. This is a big file and
=would benefit from the execution time saved. Yeah, I know, use sed,
=it's only one file. OK, how about this: grep -1 '#include .thing.' *.c?

I think sed could do the trick if we would allow a new command for it:
S: skip to the next file. It should be very easy to implement and
obviously satisfies a need (looking at the response of the net).
Somewhat for the next net pollution: sed replacement ;-)

    Leo.


    Sed fugit interea, fugit     | But in the meantime flies, flies the
    irreparabile tempus.         | irreparable time.
                                 |               VERGILIUS, Gregorica 3. 284

greywolf@unicom.UUCP (greywolf) (06/25/88)

In article <1304@ark.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes:
# In article <7962@alice.UUCP> andrew@alice.UUCP writes:
# \...
# \5) divert matching lines onto one fd, nonmatching onto another.
# \	sorry, run grep twice.
# 
# Come on! The diversion is no problem at all to implement, and it can be very
# useful (you cannot run grep twice on stdin, without use of temporary files).
# Regards.

Essentially, I think with respect to the tool -flag concept, their
attitude there is "See figure 1."  This is ESPECIALLY true when they have
the opportunity to say "NIH"! (Sounds like the knights from Monty Python:
The Search for the Holy Grail).
	For those of you who do not understand "See figure 1.", I am sure
that there are some people inside AT&T who would be happy to tell you.
They tell me every month on my phone bill.

# -- 
# South-Africa:                         |Maarten Litmaath @ Free U Amsterdam:
#          revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart
--

mouse@mcgill-vision.UUCP (der Mouse) (06/26/88)

In article <8167@ncoast.UUCP>, allbery@ncoast.UUCP (Brandon S. Allbery) writes:
> As quoted from <5826@umn-cs.cs.umn.edu> by randy@umn-cs.cs.umn.edu (Randy Orrison):
>> In article <7962@alice.UUCP> andrew@alice.UUCP writes:
[ various things, originally about context grep.  But when replying,
  ++Brandon (do you still use that name, Brandon?) says.... ]
> (My /dev/stdin complaint earlier was [...] aimed at the person who
> informed the entire Usenet that "hey, I posted a /dev/stdin driver
> source for 4.2BSD, so not a one of you has any reason not to be
> running it".  In other words, the usual 4.xBSD-source elitism.)

That person was Chris Torek, but if he'd been a bit slower it could
well have been me.  At least he didn't say it as abrasively as your
summary does.

However, if you check back and look at the original postings on this
issue, the /dev/stdin point was first brought up as follows:

> 22) support a filename of - to mean standard input.
> 	a unix without /dev/stdin is largely bogus but as a sop to the poor
> 	barstards having to work on BSD, gre will support -
> 	as stdin (at least for a while).

(This over Andrew Hume's signature.  He was explaining what features
gre would support.)

With phrasing like that, I can hardly blame Chris for rising to the
defense of his driver.  And note the "BSD" phrase: that's the context
in which Chris said there was no excuse for complaining about not
having /dev/stdin.  And in that context, I agree with him.

And there's no call for complaints about source elitism.  My /dev/stdin
driver can be added to a binary distribution; surely Chris' can too.

(Andrew, there's no call to be so insulting.  Lots (most?) of us who
use BSD don't think of ourselves as "poor barstards[sic]" who "have" to
work on BSD.  There are two SysV-based machines here I can use whenever
I feel like it; I find it extremely painful to try to do anything on
them.  But you generally don't find me talking about "poor bastards who
have to work on SysV", and you most certainly won't find me saying so
in my postings to the entire net.)

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu

allbery@ncoast.UUCP (Brandon S. Allbery) (07/04/88)

As quoted from <1186@mcgill-vision.UUCP> by mouse@mcgill-vision.UUCP (der Mouse):
+---------------
| In article <8167@ncoast.UUCP>, allbery@ncoast.UUCP (Brandon S. Allbery) writes:
| > As quoted from <5826@umn-cs.cs.umn.edu> by randy@umn-cs.cs.umn.edu (Randy Orrison):
| >> In article <7962@alice.UUCP> andrew@alice.UUCP writes:
| [ various things, originally about context grep.  But when replying,
|   ++Brandon (do you still use that name, Brandon?) says.... ]
| > (My /dev/stdin complaint earlier was [...] aimed at the person who
| > informed the entire Usenet that "hey, I posted a /dev/stdin driver
| > source for 4.2BSD, so not a one of you has any reason not to be
| > running it".  In other words, the usual 4.xBSD-source elitism.)
| 
| However, if you check back and look at the original postings on this
| issue, the /dev/stdin point was first brought up as follows:
| 
| > 22) support a filename of - to mean standard input.
| > 	a unix without /dev/stdin is largely bogus but as a sop to the poor
| > 	barstards having to work on BSD, gre will support -
| > 	as stdin (at least for a while).
| 
| (This over Andrew Hume's signature.  He was explaining what features
| gre would support.)
| 
| With phrasing like that, I can hardly blame Chris for rising to the
| defense of his driver.  And note the "BSD" phrase: that's the context
| in which Chris said there was no excuse for complaining about not
| having /dev/stdin.  And in that context, I agree with him.
| 
| And there's no call for complaints about source elitism.  My /dev/stdin
| driver can be added to a binary distribution; surely Chris' can too.
+---------------

At least part of the confusion here comes from the fact that Andrew's nasty
comment (above) got here after Chris's comment; I thought he was just
exhibiting the rather degrading attitude toward binary System V sites that
I've had to put up with ever since I started reading this net.  (C'mon,
guys, if it didn't work I wouldn't be here!)

+---------------
| work on BSD.  There are two SysV-based machines here I can use whenever
| I feel like it; I find it extremely painful to try to do anything on
| them.  But you generally don't find me talking about "poor bastards who
| have to work on SysV", and you most certainly won't find me saying so
| in my postings to the entire net.)
+---------------

You would appear to be in the minority.
-- 
Brandon S. Allbery, uunet!marque!ncoast!allbery			DELPHI: ALLBERY
	    For comp.sources.misc send mail to ncoast!sources-misc