andrew@alice.UUCP (05/23/88)
Al Aho and I are designing a replacement for grep, egrep and fgrep. The question is what flags should it support and what kind of patterns should it handle? (Assume the existence of flags to make it compatible with grep, egrep and fgrep.) The proposed flags are the V9 flags: -f file pattern is (`cat file`) -v print nonmatching -i ignore aphabetic case -n print line number -x the pattern used is ^pattern$ -c print count only -l print filenames only -b print block numbers -h do not print filenames in front of matching lines -H always print filenames in front of matching lines -s no output; just status -e expr use expr as the pattern The patterns are as for egrep, supplemented by back-referencing as in \{pattern\}\1. please send your comments about flags or patterns to research!andrew
papowell@attila.uucp (Patrick Powell) (05/25/88)
In article <7882@alice.UUCP> andrew@alice.UUCP writes: > > Al Aho and I are designing a replacement for grep, egrep and fgrep. >The question is what flags should it support and what kind of patterns >should it handle? (Assume the existence of flags to make it compatible >with grep, egrep and fgrep.) > >please send your comments about flags or patterns to research!andrew The one thing I miss about grep families is the ability to have a named search pattern. For example: DIGIT= \{[0-9]\} ALPHA=\{[a-zA-Z]\} \${ALPHA}\${PATTERN} This would sort of make sense. The other facility is to find multiple line patterns, as in: find the pair of lines that have pattern1 in the first line pattern2 in the second, etc. This I have needed sooo many times; I have ended up using AWK and a clumsy set of searches. For example: \#{1 p}Pattern \#{2}Pattern This could print out lines that match, or only the first line (1p->print this one only). Patrick Powell Prof. Patrick Powell, Dept. Computer Science, 136 Lind Hall, 207 Church St. SE, University of Minnesota, Minneapolis, MN 55455 (612)625-3543/625-4002
ljz@fxgrp.UUCP (Lloyd Zusman) (05/26/88)
In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes: In article <7882@alice.UUCP> andrew@alice.UUCP writes: > > Al Aho and I are designing a replacement for grep, egrep and fgrep. >The question is what flags should it support and what kind of patterns >should it handle? ... ... The other facility is to find multiple line patterns, as in: find the pair of lines that have pattern1 in the first line pattern2 in the second, etc. This I have needed sooo many times; I have ended up using AWK and a clumsy set of searches. For example: \#{1 p}Pattern \#{2}Pattern This could print out lines that match, or only the first line (1p->print this one only). ... Or another way to get this functionality would be for this new greplike thing to allow matches on the newline character. For example: ^.*foo\nbar.*$ ^^ newline -- Lloyd Zusman UUCP: ...!ames!fxgrp!ljz Master Byte Software Internet: ljz%fx.com@ames.arc.nasa.gov Los Gatos, California or try: fxgrp!ljz@ames.arc.nasa.gov "We take things well in hand."
alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) (05/26/88)
One thing I would _love_ is to be able to find the context of what I've found, for example, to find the two (n?) surrounding lines. I have wanted to do this many times and there is no good way. -- Alan ..!cit-vax!elroy!alan * "But seriously, what elroy!alan@csvax.caltech.edu could go wrong?"
ben@idsnh.UUCP (Ben Smith) (05/26/88)
I also would like to see more of the lex capabilities in grep. -- Integrated Decision Systems, Inc. | Benjamin Smith - East Coast Tech. Office The fitting solution in professional | Peterborough, NH portfolio management software. | UUCP: uunet!idsnh!ben
kutz@bgsuvax.UUCP (Kenneth Kutz) (05/26/88)
In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: > One thing I would _love_ is to be able to find the context of what I've > found, for example, to find the two (n?) surrounding lines. I have wanted > to do this many times and there is no good way. There is a program on the Usenix tape under .../Utilities/Telephone called 'tele'. If you call the program using the name 'g', it supports displaying of context. E-mail me if you want more info. -- -------------------------------------------------------------------- Kenneth J. Kutz CSNET kutz@bgsu.edu UUCP ...!osu-cis!bgsuvax!kutz Disclaimer: Opinions expressed are my own and not of my employer's --------------------------------------------------------------------
dcon@ihlpe.ATT.COM (452is-Connet) (05/26/88)
In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: > >One thing I would _love_ is to be able to find the context of what I've >found, for example, to find the two (n?) surrounding lines. I have wanted >to do this many times and there is no good way. Also, what line number it was found on. David Connet ihnp4!ihlpe!dcon
david@elroy.Jpl.Nasa.Gov (David Robinson) (05/27/88)
In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes: > In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: > >One thing I would _love_ is to be able to find the context of what I've > >found, for example, to find the two (n?) surrounding lines. I have wanted > >to do this many times and there is no good way. > Also, what line number it was found on. How about "grep -n"? -- David Robinson elroy!david@csvax.caltech.edu ARPA david@elroy.jpl.nasa.gov ARPA {cit-vax,ames}!elroy!david UUCP Disclaimer: No one listens to me anyway!
daveb@laidbak.UUCP (Dave Burton) (05/27/88)
In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes: |Also, what line number it was found on. Already there: grep -n. In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: |One thing I would _love_ is to be able to find the context of what I've |found, for example, to find the two (n?) surrounding lines. I have wanted |to do this many times and there is no good way. Please. Maybe "grep -k" where k is any integer giving the number of lines of context on each side of grep, default is 0. Oh, but hey, _you're_ designing it! :-) -- --------------------"Well, it looked good when I wrote it"--------------------- Verbal: Dave Burton Net: ...!ihnp4!laidbak!daveb V-MAIL: (312) 505-9100 x325 USSnail: 1901 N. Naper Blvd. #include <disclaimer.h> Naperville, IL 60540
dcon@ihlpe.ATT.COM (452is-Connet) (05/27/88)
In article <6877@elroy.Jpl.Nasa.Gov> david@elroy.Jpl.Nasa.Gov (David Robinson) writes: >In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes: >> Also, what line number it was found on. >How about "grep -n"? > Embarassed and red-faced he goes away to read the man-page...
stan@sdba.UUCP (Stan Brown) (05/27/88)
> > One thing I would _love_ is to be able to find the context of what I've > found, for example, to find the two (n?) surrounding lines. I have wanted > to do this many times and there is no good way. > > -- Alan ..!cit-vax!elroy!alan * "But seriously, what > elroy!alan@csvax.caltech.edu could go wrong?" Along this same general line it would be nice to be abble to look for paterns that span lines. But perhaps this would be tom complete a change in the philosophy of grep ? stan -- Stan Brown S. D. Brown & Associates 404-292-9497 (uunet gatech)!sdba!stan "vi forever"
jas@rain.rtech.UUCP (Jim Shankland) (05/27/88)
In article <2978@ihlpe.ATT.COM> dcon@ihlpe.UUCP (David Connet) writes: >In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: >>One thing I would _love_ is to be able to find the context of what I've >>found, for example, to find the two (n?) surrounding lines.... > >Also, what line number it was found on. You've already got the line number with the "-n" option. Note that that makes it easy to write a little wrapper script that gives you context grep. Whether that's preferable to adding the context option to grep is, I suppose, debatable; but I can already see the USENIX paper: "newgrep -[whatever] Considered Harmful" Jim Shankland ..!ihnp4!cpsc6a!\ sun!rtech!jas ..!ucbvax!mtxinu!/
aperez@cvbnet2.UUCP (Arturo Perez Ext.) (05/27/88)
From article <662@fxgrp.UUCP>, by ljz@fxgrp.UUCP (Lloyd Zusman): > In article <5630@umn-cs.cs.umn.edu> papowell@attila.UUCP (Patrick Powell) writes: > In article <7882@alice.UUCP> andrew@alice.UUCP writes: > > > > Al Aho and I are designing a replacement for grep, egrep and fgrep. > >The question is what flags should it support and what kind of patterns > >should it handle? ... Actually, I agree with the guy who posted a request shortly before this came out. The most useful feature that is currently lacking is the ability to do context greps, i.e. greps with a window. There are two ways this could be handled. One is to allow awk-like constructs specifying beginning and ending points for a window. Sort of like, e.g. grep -w '/:/,/^$/' file which would find the lines between each pair of a ':' containing line and the next following blank line. The other way would be to have a simple "number of lines around match" parameter, possibly with collapse of overlapping windows. Then you could say grep -w 5 foo file which would print 2 lines above and below the matching line. Either way it's done would be nice. I have made one attempt to implement this with a script and it wasn't too much fun... Arturo Perez ComputerVision, a division of Prime
morrell@hpsal2.HP.COM (Michael Morrell) (05/28/88)
/ hpsal2:comp.unix.questions / dcon@ihlpe.ATT.COM (452is-Connet) / 6:36 am May 26, 1988 / Also, what line number it was found on. David Connet ihnp4!ihlpe!dcon ---------- grep -n does this, but I'd like to see an option which ONLY prints the line numbers where the pattern was found. Michael Morrell hpda!morrell
bzs@bu-cs.BU.EDU (Barry Shein) (05/28/88)
Re: grep with N context lines shown... Interesting, that's very close to a concept of a multi-line record grep where I treat N lines as one and any occurrance results in a listing. The difference is the line chosen to count from (in a context the match would probably be middle and +-N, in a record you'd just list the record.) Just wondering if a generalization is being missed here somewhere, also consider grepping something like a termcap file, maybe what I really want is a generalized method to supply pattern matchers for what to list on a hit: grep -P .+3,.-3 pattern # print +-3 lines centered on match grep -P ?^[^ \t]?,.+1 pattern # print from previous line not # beginning with white space to # one past current line Of course, that destroys the stream nature of grep, it has to be able to arbitrarily back up, ugh, although "last candidate for a start" could be saved on the fly. The nice thing is that it can use (essentially) the same pattern machinery for choosing printing (I know, have to add in the notion of dot etc.) I dunno, food for thought, like I said, maybe there's a generalization here somewhere. Or maybe grep should just emit line numbers in a form which could be post-processed by sed for fancier output (grep in backquotes on sed line.) Therefore none of this is necessary :-) -Barry Shein, Boston University
barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/28/88)
[mail bounced] There have been times when I wanted a grep that would print out the first occurrence and then stop. -- Bruce G. Barnett <barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP> uunet!steinmetz!barnett
wyatt@cfa.harvard.EDU (Bill Wyatt) (05/28/88)
> There have been times when I wanted a grep that would print out the > first occurrence and then stop. grep '(your_pattern_here)' | head -1 -- Bill UUCP: {husc6,ihnp4,cmcl2,mit-eddie}!harvard!cfa!wyatt Wyatt ARPA: wyatt@cfa.harvard.edu (or) wyatt%cfa@harvard.harvard.edu BITNET: wyatt@cfa2 SPAN: cfairt::wyatt
ado@elsie.UUCP (Arthur David Olson) (05/28/88)
> > There have been times when I wanted a grep that would print out the > > first occurrence and then stop. > > grep '(your_pattern_here)' | head -1 Doesn't cut it for grep '(your_pattern_here)' firstfile secondfile thirdfile ... -- ado@ncifcrf.gov ADO is a trademark of Ampex.
roy@phri.UUCP (Roy Smith) (05/28/88)
wyatt@cfa.harvard.EDU (Bill Wyatt) writes: [as a way to get just the first occurance of pattern] > grep '(your_pattern_here)' | head -1 Yes, it'll certainly work, but I think it bypasses the original intention; to save CPU time. If I had a 1000 line file with pattern on line 7, I want grep to read the first 7 lines, print out line 7, and exit. grep|head, on the other hand, will read and search all 1000 lines of the file; it won't exit (with a EPIPE) until it writes another line to stdout and finds that head has already exited. In fact, if grep block-buffers its output, it may never do more than a single write(2) and never notice that head has exited. Anyway, I agree with the "find first match" flag being a good idea. It would certainly speed up things like grep "^Subject: " /usr/spool/news/comp/sources/unix/* where I know that the pattern is going to be matched in the first few lines and don't want to bother searching the rest of the multi-killoline file. -- Roy Smith, System Administrator Public Health Research Institute 455 First Avenue, New York, NY 10016 {allegra,philabs,cmcl2,rutgers}!phri!roy -or- phri!roy@uunet.uu.net
chip@vector.UUCP (Chip Rosenthal) (05/29/88)
In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes: >> grep '(your_pattern_here)' | head -1 >Doesn't cut it for > grep '(your_pattern_here)' firstfile secondfile thirdfile ... nor if you want to see if a match was found by testing the exit status -- Chip Rosenthal /// chip@vector.UUCP /// Dallas Semiconductor /// 214-450-0400 {uunet!warble,sun!texsun!rpp386,killer}!vector!chip I won't sing for politicians. Ain't singing for Spuds. This note's for you.
guy@gorodish.Sun.COM (Guy Harris) (05/29/88)
> grep -n does this, but I'd like to see an option which ONLY prints the line > numbers where the pattern was found. I wouldn't - if you're only grepping one file, you can do it without such an option: grep -n <pattern> <file> | sed -n 's/\([0-9]*\):.*/\1/p' If you're grepping more than one file, you obviously have to decide what you want to do with the file name and the line number; once you do, just change the "sed" pattern appropriately (and note that if the list of files is variable, you either have to stick "/dev/null" in there to make sure the names are generated even if there's only one file or have the script distinguish between the one-file and >1-file cases; I seem to remember some indication that the new BTL research "grep" would have a flag to tell it always to give the file name).
russ@groucho.ucar.edu (Russ Rew) (05/30/88)
I also recently had a need for printing multi-line "records" in which a specified pattern appeared somewhere in the record. The following short csh script uses the awk capability to treat whole lines as fields and empty lines as record separators to print all the records from standard input that contain a line matching a regular specified as an argument: #!/bin/csh -f awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} ' Russ Rew * UCAR (University Corp. for Atmospheric Research) PO Box 3000 * Boulder, CO 80307-3000 * 303-497-8845 russ@unidata.ucar.edu * ...!hao!unidata!russ
joey@tessi.UUCP (Joe Pruett) (05/31/88)
> >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. > >grep '(your_pattern_here)' | head -1 This works, but is quite slow if the input to grep is large. A hack I've made to egrep is a switch of the form -<number>. This causes only the first <number> matches to be printed, and then the next file is searched. This is great for: egrep -1 ^Subject * in a news directory to get a list of Subject lines.
jqj@uoregon.uoregon.edu (JQ Johnson) (05/31/88)
In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: > >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. >grep '(your_pattern_here)' | head -1 This is, of course, unacceptable if you are searching a very long file (say, a census database) and have LOTS of pipe buffering. Too bad it isn't feasible to have a shell that can optimize pipelines.
dan@maccs.UUCP (Dan Trottier) (05/31/88)
In article <8077@elsie.UUCP> ado@elsie.UUCP (Arthur David Olson) writes: >> > There have been times when I wanted a grep that would print out the >> > first occurrence and then stop. >> >> grep '(your_pattern_here)' | head -1 > >Doesn't cut it for > > grep '(your_pattern_here)' firstfile secondfile thirdfile ... This is getting ridiculous and can be taken to just about any level... foreach i (file1 file2 ...) grep 'pattern' $i | head -1 end -- A.I. - is a three toed sloth! | ...!uunet!mnetor!maccs!dan -- Official scrabble players dictionary -- | dan@mcmaster.BITNET
leo@philmds.UUCP (Leo de Wit) (05/31/88)
In article <292@ncar.ucar.edu> russ@groucho.UCAR.EDU (Russ Rew) writes: >I also recently had a need for printing multi-line "records" in which a >specified pattern appeared somewhere in the record. The following >short csh script uses the awk capability to treat whole lines as fields >and empty lines as record separators to print all the records from >standard input that contain a line matching a regular specified as an >argument: > >#!/bin/csh -f >awk 'BEGIN {RS = ""; FS = "\n"; OFS = "\n"; ORS = "\n\n"} /'"$1"'/ {print} ' > > Awk is a nice solution, but sed is a much faster one. I've been following the 'grep' discussion for some time now, and have seen much demand for features that are simply within sed. Here are some; I have left the discussion about the function of this or that sed-command out: there is a sed article and a man page... Patrick Powell writes: >The other facility is to find multiple line patterns, as in: >find the pair of lines that have pattern1 in the first line >pattern2 in the second, etc. Try this one: sed -n -e '/PATTERN1/,/PATTERN2/p' file It prints all lines between PATTERN1 and PATTERN2 matches. Of course you can have subcommands to do special things (with '{' I mean). Alan (..!cit-vax!elroy!alan) writes: >One thing I would _love_ is to be able to find the context of what I've >found, for example, to find the two (n?) surrounding lines. I have wanted >to do this many times and there is no good way. There is. Try this one: sed -n -e ' /PATTERN/{ x p x p n p } h' file It prints the line before, the line containing the PATTERN, and the line after. Of course you can make the output fancier and the number of lines printed larger. David Connet writes: >> >>One thing I would _love_ is to be able to find the context of what I've >>found, for example, to find the two (n?) surrounding lines. I have wanted >>to do this many times and there is no good way. >Also, what line number it was found on. Sed can also handle this one: sed -n -e '/PATTERN/=' file Lloyd Zusman writes: >Or another way to get this functionality would be for this new greplike >thing to allow matches on the newline character. For example: > ^.*foo\nbar.*$ > ^^ > newline Sed can match on embedded newline characters in the substitute command (it is indeed \n here!). The trailing newline is matched by $. Barry Shein writes [story about relative addressing]: >I dunno, food for thought, like I said, maybe there's a generalization >here somewhere. Or maybe grep should just emit line numbers in a form >which could be post-processed by sed for fancier output (grep in >backquotes on sed line.) Therefore none of this is necessary :-) Quite right. I think most times you want to see the context it is in interactive use. In that case you can write a simple sed-script that does just what is needed, i.e. display the [/pattern/-N] through [/pattern/+N] lines , where N is a constant. The example I gave for N == 1 can be extended for larger N, with fancy output etc. Bill Wyatt writes: >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. > >grep '(your_pattern_here)' | head -1 Much simpler, and faster: sed -n -e '/PATTERN/{ p q }' file Sed quits immediately after finding the first match. You could even create an alias for something like that. Michael Morrell writes: >>Also, what line number it was found on. >grep -n does this, but I'd like to see an option which ONLY prints the line >numbers where the pattern was found. The sed trick does this: sed -n -e '/PATTERN/=' file Or you could even: sed -n -e '/PATTERN/{ = q }' file which prints the first matched line number and exits. Roy Smith writes: >wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >[as a way to get just the first occurance of pattern] >> grep '(your_pattern_here)' | head -1 > Yes, it'll certainly work, but I think it bypasses the original >intention; to save CPU time. If I had a 1000 line file with pattern on >line 7, I want grep to read the first 7 lines, print out line 7, and exit. >grep|head, on the other hand, will read and search all 1000 lines of the >file; it won't exit (with a EPIPE) until it writes another line to stdout >and finds that head has already exited. In fact, if grep block-buffers its >output, it may never do more than a single write(2) and never notice that >head has exited. Quite right. The sed-solution I mentioned before is fast and neat. In fact, who needs head: sed 10q does the job, as you can find in a book of Kernigan and Pike, I thought the title was 'the Unix Programming Environment'. Stan Brown writes: > Along this same general line it would be nice to be abble to > look for paterns that span lines. But perhaps this would be > tom complete a change in the philosophy of grep ? As I mentioned before, embedded newlines can be matched by sed in the substitute command. What I also see often is things like grep 'pattern' file | sed 'expression' A pity a lot of people don't know that sed can do the pattern matching itself. S. E. D. (Sic Erat Demonstrandum) As far as options for a new grep are conceirned, I suggest to use the options proposed (and no more). Let other tools handle other problems - that's in the Un*x spirit. What I would appreciate most in a new grep is: no more grep, egrep, fgrep, just one tool that can be both fast (for fixed strings) and elaborate (for pattern matching like egrep). The 'bm' tool that was on the net (author Peter Bain) is very fast for fixed strings, using the Boyer-Moore algorithm. Maybe this knowledge could be 'joined in'...? Leo.
barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (05/31/88)
In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: | |> There have been times when I wanted a grep that would print out the |> first occurrence and then stop. | |grep '(your_pattern_here)' | head -1 Yes I have tried that. You are missing the point. Have you ever waited for a computer? There are times when I want the first occurrence of a pattern without reading the entire (i.e. HUGE) file. Or there are times when I want the first occurrence of a pattern from hundreds of files, but I don't want to see the pattern more than once. And yes I know how to write a shell script that does this. IMHO (sarcasm mode on), it is more efficient to call grep once for one hundred files, than to call (grep $* /dev/null|head -1) one hundred times. -- Bruce G. Barnett <barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP> uunet!steinmetz!barnett
gwc@root.co.uk (Geoff Clare) (05/31/88)
Most of the useful things people have been saying they would like to be able to do with 'grep' can already be done very simply with 'sed'. For example: Stop after first match: sed -n '/pattern/{p;q;}' Match over two lines: sed -n 'N;/pat1\npat2/p;D' It should also be possible to get a small number of context lines by judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't tried it. Anyway, this can be done with a normal line editor (if the data to be searched aren't coming from a pipe) with 'g/pattern/-,+p'. I was rather alarmed to see the proposal for 'pattern repeat' in the original article was '\{pattern\}\1' rather than '\(pattern\)\1', as the latter is already used for this purpose in the standard editors (ed, ex/vi, sed). Or was it a typo? By the way, does anyone know why the ';' command terminator in 'sed' is not documented? It works on all the systems I've tried it on, but I have never found it in any manuals. It's so much nicer than putting the commands on separate lines, or using multiple '-e' options. -- Geoff Clare UniSoft Limited, Saunderson House, Hayne Street, London EC1A 9HH gwc@root.co.uk ...!mcvax!ukc!root44!gwc +44-1-606-7799 FAX: +44-1-726-2750
andyc@omepd (T. Andrew Crump) (05/31/88)
In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. > >grep '(your_pattern_here)' | head -1 Yes, but it forces grep to search a whole file, when what you may have wanted was at the beginning. This is inefficient if the "file" is large. A more general version of this request would be a parameter that would restrict grep to n or less occurrences, maybe 'grep -N #'. -- Andy Crump
trb@ima.ISC.COM (Andrew Tannenbaum) (05/31/88)
> I seem to remember some indication that the new BTL research "grep" > would have a flag to tell it always to give the file name). I have always wanted to be able to tell grep to NOT print the file names on a multi-file grep. Let's say I want a phone number script - usually a simple grep - but if I want to store the numbers in multiple files (e.g. mine and my departments), then the output contains unsightly filenames. This has always struck me as opposite to the UNIX philosophy of having a filter provide output that is useful as data. I would like the option to go to the next file after first match (regardless of which other options are present). Also, I would like to print a region other than line on a match. It would be nice to delimit the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others) would be useful. Andrew Tannenbaum Interactive Boston, MA +1 617 247 1155
avr@mtgzz.UUCP (XMRP50000[jcm]-a.v.reed) (05/31/88)
In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes: > Also, what line number it was found on. > grep -n does this, but I'd like to see an option which ONLY prints the line > numbers where the pattern was found. # In ksh's $ENV - otherwwise use a shell script: function lngrep { if [ "$#" = '1' ] then grep -n $@ | cut -f1 -d: else grep -n $@ | cut -f1,2 -d: fi } Adam Reed (mtgzz!avr)
glennr@holin.ATT.COM (Glenn Robitaille) (06/01/88)
> > > There have been times when I wanted a grep that would print out the > > > first occurrence and then stop. > > > > grep '(your_pattern_here)' | head -1 > > Doesn't cut it for > > grep '(your_pattern_here)' firstfile secondfile thirdfile ... Well, if you have a shell command like # # save the search patern # patern=$1 # # remove search patern from $* # shift for i in $* do # # grep for search patern # line=`grep ${patern} ${i}|head -1` # # if found, print file name and string # test -n "$line" && echo "${i}:\t${line}" done It'll work fine. If you want to use other options, have them in quotes as part of the first argument. Glenn Robitaille AT&T, HO 2J-207 ihnp4!holin!glennr Phone (201) 949-7811
aeb@cwi.nl (Andries Brouwer) (06/01/88)
In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: > >> There have been times when I wanted a grep that would print out the >> first occurrence and then stop. > >grep '(your_pattern_here)' | head -1 A fast way of searching for the first occurrence is really useful. I have a version of grep called `contains', and a shell script for formatting that says: if the input contains .[ then use refer; if it contains .IS then ideal; if .PS then pic; if .TS then tbl, etc. -- Andries Brouwer -- CWI, Amsterdam -- uunet!mcvax!aeb -- aeb@cwi.nl
jin@hplabsz.HPL.HP.COM (Tai Jin) (06/01/88)
In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes: >I have always wanted to be able to tell grep to NOT print the file >names on a multi-file grep. Let's say I want a phone number script - >usually a simple grep - but if I want to store the numbers in multiple >files (e.g. mine and my departments), then the output contains >unsightly filenames. This has always struck me as opposite to the UNIX >philosophy of having a filter provide output that is useful as data. Actually, I think the Unix philosophy is to have simple filters and use pipes to construct more complex filters. Unfortunately, you can't do everything with pipes. >I would like the option to go to the next file after first match >(regardless of which other options are present). Also, I would like to >print a region other than line on a match. It would be nice to delimit >the patterns using regexps, as "-n,+n" and "?^$?,/^$/" (among others) >would be useful. I also would like a context grep that greps for records with arbitrary delimiters. I started working on this, but I've had no time to finish it. ...tai
hasch@gypsy.siemens-rtl (Harald Schaefer) (06/01/88)
If you are only interested in the first occurence of a pattern, you can use
something like
sed -n '/<pattern>/ {
p
q
}' file
Harald Schaefer
Siemens Corp. - RTL
Bus. Phone (609) 734 3389
Home Phone (609) 275 1356
uucp: ...!princeton!gypsy!hasch
hasch@gypsy.uucp
ARPA: hasch@siemems.com
hasch%siemens@princeton.EDU
aburt@isis.UUCP (Andrew Burt) (06/01/88)
I'd like to see the following enhancements in a grepper: - \< and \> to match word start/end as in vi, with -w option as in BSD grep to match pattern as a word. - \w in pattern to match whitespace (generalization: define \unused-letter as a pattern; or allow full lex capability). - way to invert piece of pattern such as: grep foo.*\^bar\^xyzzy with meaning as in: grep foo | grep -v bar | grep -v xyzzy (or could be written grep foo.*\^(bar|xyzzy) of course). - Select Nth occurrence of match (generalization: list of matches to show: grep -N -2,5-7,10- ... to grab up to the 2nd, 5th through 7th, and from the 10th onward). - option to show lines between matches (not just matching lines) as in: grep -from foo -to bar ... meaning akin to sed/ed's /foo/,/bar/p. (But much more useful with other extensions). - Allow matching newlines in a "binary" (or non-text) sort of mode: grep -B 'foo.*bar' finds foo...bar even if they are not on the same line. (But printing the "line" that matches wouldn't be useful anymore, so just printing the matched text would be better. Someone wanting lines could look for \n[^\n]*foo.*bar[^\n]*\n, though a syntax to make this easier might be in order. Perhaps this wouldn't be an example of a binary case -- but a new character with meaning like '.' but matching ANY character would work: if @ is such a character then "grep foo@*bar". Perhaps a better example, assuming the \^ for inversion syntax above would be "grep foo@*(\^bar)bar -- otherwise it would match from first foo to last bar, while I might want from first foo to first bar.) - provide byte offset of start of match (like block number or line number) useful for searching non-text files. - Provide a lib func that has the RE code in it. - Install RE code in other programs: awk/sed/ed/vi etc. Oh for a standardized RE algorithm! -- Andrew Burt isis!aburt Fight Denver's pollution: Don't Breathe and Drive.
jjg@linus.UUCP (Jeff Glass) (06/01/88)
In article <470@q7.tessi.UUCP> joey@tessi.UUCP (Joe Pruett) writes: > >grep '(your_pattern_here)' | head -1 > > This works, but is quite slow if the input to grep is large. A hack > I've made to egrep is a switch of the form -<number>. This causes only > the first <number> matches to be printed, and then the next file is > searched. This is great for: > > egrep -1 ^Subject * > > in a news directory to get a list of Subject lines. Try: sed -n -e '/pattern/{' -e p -e q -e '}' filename This prints the first occurrence of the pattern and then stops searching the file. The generalizations for printing the first <n> matches and searching <m> files (where n,m > 1) are more awkward (no pun intended) but are possible. /jeff
brianm@sco.COM (Brian Moffet) (06/01/88)
In article <4537@vdsvax.steinmetz.ge.com> barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) writes: >In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >|grep '(your_pattern_here)' | head -1 > >Or there are times when I want the first occurrence of a pattern from >hundreds of files, but I don't want to see the pattern more than once. > Have you tried sed? How about $ sed -n '/pattern/p;/pattern/q' file ??? -- Brian Moffet brianm@sco.com {uunet,decvax!microsof}!sco!brianm The opinions expressed are not quite clear and have no relation to my employer. 'Evil Geniuses for a Better Tommorrow!'
anw@nott-cs.UUCP (06/01/88)
In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: > One thing I would _love_ is to be able to find the context of what I've > found, for example, to find the two (n?) surrounding lines. I have wanted > to do this many times and there is no good way. See below. Does n == 4, but easily changed. In article <590@root44.co.uk> gwc@root.co.uk (Geoff Clare) writes: > > Most of the useful things people have been saying they would like to be > able to do with 'grep' can already be done very simply with 'sed'. Which is not to say that they shouldn't also be in "*grep"! > [ good examples omitted ] > > It should also be possible to get a small number of context lines by > judicious use of the 'hold space' commands (g, G, h, H, x), but I haven't > tried it. [ ... ] The following is "/usr/bin/kwic" on this machine (PDP 11/44 running V7). I wrote it about three years ago in response to a challenge from some AWK zealots; it runs *much* faster than the equivalent AWK script. That is, it is sloooww rather than ssllloooooowwww. I have a manual entry for it which is too trivial to send. Bourne shell, of course. Use at whim and discretion. Several minor bugs, mainly (I hope!) limitations of or between "sh" and "sed". (Note that the various occurrences of multiple spaces in "s..." commands are all TABs, in case mailers/editors/typists have mangled things.) > By the way, does anyone know why the ';' command terminator in 'sed' is > not documented? It works on all the systems I've tried it on, but I > have never found it in any manuals. It's so much nicer than putting > the commands on separate lines, or using multiple '-e' options. No, I don't know why, but it isn't the only example in Unix of a facility most easily discovered by looking in the source. I've occasionally used it, but I tried re-writing the following that way, and it *didn't* look so much nicer; in fact it looked 'orrible. --------------------------------- [cut here] ----------------------------- [ $# -eq 0 ] && { echo "Usage: $0 pattern [file] ..." 1>&2; exit 1; } l='[^\n]*\n' pat="$1" shift exec sed -n "/$pat"'/ b found s/^/ / H g /^'"$l$l$l$l$l"'/ s/\n[^\n]*// h b : found s/^/++ / H g s/.//p s/.*// h : loop $ b out n /'"$pat"'/ b found s/^/ / H g /^'"$l$l$l$l"'/ !b loop : out s/.//p s/.*/-----------------/ h ' ${1+"$@"} -- Andy Walker, Maths Dept., Nott'm Univ., UK. anw@maths.nott.ac.uk
andrew@alice.UUCP (06/01/88)
in my naivity, i had not been following netnews closely after i posted the original ``grep replacement'' article. I assumed that people would reply to me, not the net. That is the reason i have not been participating in the discussion. i will be posting my resolution of the suggestions shortly. many people have written about patterns matching multiple lines. grep will not do this. if you really need this, use sam by rob pike as described in the nov 1987 software practice and experience. the code is available for a plausible fee from the at&t toolchest.
sef@csun.UUCP (Sean Fagan) (06/02/88)
Something I'd like to see is this: grep '^<somepattern>$^<morepatterns>$...'. While this would, of course, not be trivial, I think it would probably be more general (and therefore more in the "spirit" of Unix(tm)) than showing n lines around a matched pattern. But that's just my opinion. -- Sean Fagan (818) 885-2790 uucp: {ihnp4,hplabs,psivax}!csun!sef CSUN Computer Center BITNET: 1GTLSEF@CALSTATE Northridge, CA 91330 DOMAIN: sef@CSUN.EDU "I just build fast machines." -- S. Cray
jfh@rpp386.UUCP (John F. Haugh II) (06/02/88)
In article <2117@uoregon.uoregon.edu> jqj@drizzle.UUCP (JQ Johnson) writes: >In article <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >>> There have been times when I wanted a grep that would print out the >>> first occurrence and then stop. >>grep '(your_pattern_here)' | head -1 >This is, of course, unacceptable if you are searching a very long file >(say, a census database) and have LOTS of pipe buffering. > >Too bad it isn't feasible to have a shell that can optimize pipelines. there is a boyer/moore based fast grep in the archives. adding an additional option (say '-f' for first in each file?) should be quite simple. perhaps i'll post the diff's if i remember to go hack on the sucker any time soon. - joh. -- John F. Haugh II | "If you aren't part of the solution, River Parishes Programming | you are part of the precipitate." UUCP: ihnp4!killer!rpp386!jfh | -- long since forgot who DOMAIN: jfh@rpp386.uucp |
john@frog.UUCP (John Woods) (06/02/88)
In article <590@root44.co.uk>, gwc@root.co.uk (Geoff Clare) writes: > Most of the useful things people have been saying they would like to be > able to do with 'grep' can already be done very simply with 'sed'. > For example: > Stop after first match: sed -n '/pattern/{p;q;}' Close, but no cigar. It does not work for multiple input files. (And, of course, spawning off a new sed for each file defeats the basic desire of most of the people who've asked for it: speed) However, awk '/^Subject: / { print FILENAME ":" $0; next }' * does (just about) work. And it's probably not _obscenely_ slow. (it doesn't behave for no input files, and you might prefer no FILENAME: for just a single input file) -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu No amount of "Scotch-Guard" can repel the ugly stains left by REALITY... - Griffy
les@chinet.UUCP (Leslie Mikesell) (06/02/88)
In article <2018@hplabsz.HPL.HP.COM> jin@hplabsz.UUCP (Tai Jin) writes: >>I have always wanted to be able to tell grep to NOT print the file >>names on a multi-file grep. > >Actually, I think the Unix philosophy is to have simple filters and use >pipes to construct more complex filters. Unfortunately, you can't do >everything with pipes. In this case it can be done with pipes: cat file.. |grep pattern Les Mikesell
mdorion@cmtl01.UUCP (Mario Dorion) (06/03/88)
In article <2978@ihlpe.ATT.COM>, dcon@ihlpe.ATT.COM (452is-Connet) writes: > In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: > > > >One thing I would _love_ is to be able to find the context of what I've > >found, for example, to find the two (n?) surrounding lines. I have wanted > >to do this many times and there is no good way. > > Also, what line number it was found on. > > David Connet > ihnp4!ihlpe!dcon Ever tried grep -n ????? There are three features I would like to see in a grep-like program: 1- Be able to use a newline character in the regular expression grep 'this\nthat' file 2- Be able to grep more than one regular expression with one call. This would be faster than issuing many calls since the file would be read only once. 3- To have an option to search only for the first occurence of the pattern. Sometimes you KNOW that the pattern is there only once (for example if you grep '^Subject:' on news files) and there's just no need to scan the rest of the file. When 'grepping' into many files it would return the first occurence for each file. -- Mario Dorion | ...!{rutgers,uunet,ihnp4}! Frisco Bay Industries | philabs!micomvax!cmtl01!mdorion Montreal, Canada | 1 (514) 738-7300 | I thought this planet was in public domain!
andrew@alice.UUCP (06/03/88)
In article <449@happym.UUCP>, kent@happym.UUCP writes: > From alice.UUCP?? Ha ha! That's Bell Labs! It will be in V10 > Unix, and none of us humans will see it until sysVr6, and only then > if we are lucky!! Context: the right thing to do is to write a context program that takes input looking like "filename:linenumber:goo" and prints whatever context you like. we can then take this crap out of grep and diff and make it generally available for use with programs like the C compiler and eqn and so on. It can also do the right thing with folding together nearby lines. At least one good first cut has been put on the net but a C program sounds easy enough to do. Source: the software i write is publicly available because it matters to me. it was a hassle but mk and fio are available to everybody for reasonable cost (< $125 commercial, nearly free educational). i am trying hard to do the same for the new grep. it will be in V10, it will be in plan9, and should be in SVR4 (the joint sun-at&t release).
lyndon@ncc.Nexus.CA (Lyndon Nerenberg) (06/04/88)
In article <1039@ima.ISC.COM> trb@ima.UUCP (Andrew Tannenbaum) writes: >I have always wanted to be able to tell grep to NOT print the file >names on a multi-file grep. That's easy :-) Just pipe it through cut(1). Works great unless you have a ':' as part of the file name... -- {alberta,utzoo,uunet}!ncc!lyndon lyndon@Nexus.CA
mdorion@cmtl01.UUCP (Mario Dorion) (06/04/88)
In article <2450011@hpsal2.HP.COM>, morrell@hpsal2.HP.COM (Michael Morrell) writes: > > (...) I'd like to see an option which ONLY prints the line > numbers where the pattern was found. > > Michael Morrell > hpda!morrell You could use the following: grep -n 'foo' bar | cut -d: -f1 -- Mario Dorion | ...!{rutgers,uunet,ihnp4}! Frisco Bay Industries | philabs!micomvax!cmtl01!mdorion Montreal, Canada | 1 (514) 738-7300 | I thought this planet was in public domain!
allbery@ncoast.UUCP (Brandon S. Allbery) (06/05/88)
As quoted from <2312@bgsuvax.UUCP> by kutz@bgsuvax.UUCP (Kenneth Kutz): +--------------- | In article <6866@elroy.Jpl.Nasa.Gov>, alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: | > One thing I would _love_ is to be able to find the context of what I've | > found, for example, to find the two (n?) surrounding lines. I have wanted | > to do this many times and there is no good way. +--------------- grep -n foo ./bar | context 2 I posted context to net.sources back when it existed; someone may still have archives from that time, if not I'll retrieve my sources and repost it. It takes lines of the basic form filename ... linenumber : ... and displays context around the specified lines. I use this with grep quite often; it also works with cc (pcc, not Xenix cc) error messages. -- Brandon S. Allbery | "Given its constituency, the only uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about Delphi: ALLBERY MCI Mail: BALLBERY | [the Open Software Foundation] is comp.sources.misc: ncoast!sources-misc | its mouth." --John Gilmore
gwyn@brl-smoke.UUCP (06/05/88)
In article <7944@alice.UUCP> andrew@alice.UUCP writes: > the right thing to do is to write a context program that takes >input looking like "filename:linenumber:goo" and prints whatever context ... Heavens -- a tool user. I thought that only Neanderthals were still alive. I guess Bell Labs escaped the plague.
hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)
4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) >In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: >| >|> There have been times when I wanted a grep that would print out the >|> first occurrence and then stop. >| >|grep '(your_pattern_here)' | head -1 > [...] > >Have you ever waited for a computer? No, never. :-) >There are times when I want the first occurrence of a pattern without >reading the entire (i.e. HUGE) file. I realize this is dependent on the way in which processes sharing a pipe act, but this is a point worth considering before we get yet another annoying burst of "cat -v" type programs. grep pattern file1 ... fileN | head -1 This should send grep a SIGPIPE as soon as the first line of output trickles through the pipe. This would result in relatively little of the file actually being read under most Unix implementations. I would agree that it is a bad thing to rely on the granularity of a pipe. Here is a sample program which can be used to show you what I mean. Name it grep, and use it thus wise: % ./grep pattern * | head -1 /* ------------- Cut here --------------- */ #include <stdio.h> #include <signal.h> sighandler(sig) int sig; { if (sig == SIGPIPE) fprintf(stderr,"Died from a SIGPIPE\n"); else fprintf(stderr,"Died from signal #%d\n", sig); exit(0); } main() { signal(SIGPIPE,sighandler); for (;;) printf("pattern\n"); } /* Jim Hutchison UUCP: {dcdwest,ucbvax}!cs!net1!hutch ARPA: Hutch@net1.ucsd.edu Disclaimer: The cat agreed that it would be o.k. to say these things. */
hutch@net1.ucsd.edu (Jim Hutchison) (06/05/88)
I can think of a few nasty ways to do this one, I am hoping to get a better answer. A grep with a window of context around it. A few lines proceeding and following the pattern I am looking for. The VMS search command sported this as an option/qualifier. I miss it sometimes (not VMS, just a few of the more wacky utilities, like the editor option for creation of multi-key data base files :-). /* Jim Hutchison UUCP: {dcdwest,ucbvax}!cs!net1!hutch ARPA: Hutch@net1.ucsd.edu Disclaimer: The cat agreed that it would be o.k. to say these things. */
tbray@watsol.waterloo.edu (Tim Bray) (06/05/88)
Grep should, where reasonable, not be bound by the notion of a 'line'. As a concrete expression of this, the useful grep -l (prints the names of the files that contain the string) should work on any kind of file. More than one existing 'grep -l' will fail, for example, to tell you which of a bunch of .o files contain a given string. Scenario - you're trying to link 55 .o's together to build a program you don't know that well. You're on berklix. ld sez: "undefined: _memcpy". You say: "who's doing that?". The source is scattered inconveniently. The obvious thing to do is: grep -l _memcpy *.o That this often will not work is irritating. Tim Bray, New Oxford English Dictionary Project, U of Waterloo
bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)
From: gwyn@brl-smoke.ARPA (Doug Gwyn ) >In article <7944@alice.UUCP> andrew@alice.UUCP writes: >> the right thing to do is to write a context program that takes >>input looking like "filename:linenumber:goo" and prints whatever context ... > >Heavens -- a tool user. I thought that only Neanderthals were still alive. >I guess Bell Labs escaped the plague. Almost, unless the original input was produced by a pipeline, in which case this (putative) post-processor can't help unless you tee the mess to a temp file, yup, mess is the right word. Or maybe only us Neanderthals are interested in tools which work on pipes? Have they gone out of style? -Barry "Ulak of Org" Shein, Boston University
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/05/88)
In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >Almost, unless the original input was produced by a pipeline, in which >case this (putative) post-processor can't help unless you tee the mess >to a temp file, yup, mess is the right word. The proposed tool would be very handy on ordinary text files, but it is hard to see a use for it on pipes. Or, getting back to context-grep, what good would it do to show context from a pipe? To do anything with the information (other than stare at it), you'd need to produce it again. There might be some use for context-{grep,diff,...} on a stream, but if a separate context tool will satisfy 99% of the need, as I think it would, as well as provide this capability for other commands "for free", it would be a better approach than hacking context into other commands. By the way, I hope the new grep when asked to always produce the filename will use "-" for stdin's name, and the context tool would also follow the same convention. Even though the Research systems have /dev/stdin, other sites may not, and anyway (as we've just seen) stdin isn't really a definite object.
nelson@sun.soe.clarkson.edu (Russ Nelson) (06/05/88)
In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >In article <7944@alice.UUCP> andrew@alice.UUCP writes: >> the right thing to do is to write a context program that takes >>input looking like "filename:linenumber:goo" and prints whatever context ... > >Almost, unless the original input was produced by a pipeline, in which >case this (putative) post-processor can't help unless you tee the mess >to a temp file, yup, mess is the right word. How about: alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$ or something like that? Does that offend tool-users sensibilities? *Do* Neanderthals have any sensibilities? -- signed char *reply-to-russ(int network) { /* Why can't BITNET go */ if(network == BITNET) return "NELSON@CLUTX"; /* domainish? */ else return "nelson@clutx.clarkson.edu"; }
bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)
From: gwyn@brl-smoke.ARPA (Doug Gwyn ) >In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >>Almost, unless the original input was produced by a pipeline, in which >>case this (putative) post-processor can't help unless you tee the mess >>to a temp file, yup, mess is the right word. > >The proposed tool would be very handy on ordinary text files, >but it is hard to see a use for it on pipes. Or, getting back >to context-grep, what good would it do to show context from a >pipe? To do anything with the information (other than stare >at it), you'd need to produce it again. What else are context displays for except to stare at (or save in a file for later staring)? Are the resultant contexts often the input to other programs? (I know that 'patch' can take a context input but that's irrelevant, it hardly needs nor prefers a context diff to my knowledge, it's just being accomodating so humans can look at the context diff if something botches.) Actually, I can answer that in the context of the original suggestion. The motivation for a context comes in two major flavors: A) To stare at (the surrounding context gives a human some hint of the context in which the text appeared) B) Because the context really represents a multi-line (eg) record, such as pulling out every termcap or terminfo entry which contains some property but desiring the result to contain the entire multiline entry so it could be re-used to create a new file. In either case it's independent of whether the data is coming from a pipe (as it should be.) Its pipeness may be caused by something as simple as the data being grabbed across the network (rsh HOST cat foo | ...). Anyhow, I think it's bad in general to demand the reasoning of why a selection operator should work in a pipe, it just should (although I have presented a reasonable argument.) That's what tools are all about. >There might be some >use for context-{grep,diff,...} on a stream, but if a separate >context tool will satisfy 99% of the need, as I think it would, >as well as provide this capability for other commands "for free", >it would be a better approach than hacking context into other >commands. I think claiming that 99% of the use won't need pipes is unsound, it should just work with a pipe and any tool which requires passing the file name and then re-positioning the file just won't, it's violating a fundamental design concept by doing this (not that in rare cases this might not be necessary, but I don't see where this is one of them unless you use the circular argument of it "must be a separate program".) The reasoning for adding it to grep would be: a) Grep already has its finger on the context, it's right there (or could be), why re-process the entire stream/file just to get it printed? Grep found the context, why find it again? b) The context suggestions are merely logical generalizations of the what grep already does, print the context of a match (it just happens to now limit that to exactly one line.) Nothing new conceptually is being added, only generalized. In fact, if I were to write this context-display tool my first thought would be to just use grep and try to emit unique patterns (a la TAGS files) which grep can then re-scan. But grep doesn't quite cut it w/o this little generalization. I think we're going in circles and this post-processor is nothing more than a special case of grep or perhaps cat or sed the way it was proposed (why not just generate sed commands to list the lines if that's all you want?) Anyhow, at least we're back to the technical issues and away from calling anyone who disagrees Neanderthals... -Barry Shein, Boston University
bzs@bu-cs.BU.EDU (Barry Shein) (06/05/88)
From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me] >>Almost, unless the original input was produced by a pipeline, in which >>case this (putative) post-processor can't help unless you tee the mess >>to a temp file, yup, mess is the right word. > >How about: > >alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$ > >or something like that? Does that offend tool-users sensibilities? >*Do* Neanderthals have any sensibilities? I don't understand, the way to avoid having to tee it into temp files is to tee it into temp files? Given that sort of solution we can eliminate pipes entirely from unix, was that your point? That pipes are fundamentally useless and can always be eliminated via use of intermediate temp files? It begs the question, burying it in a little syntactic sugar with an alias command doesn't solve the problem. -Barry Shein, Boston University
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/06/88)
In article <23142@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >Anyhow, at least we're back to the technical issues and away from >calling anyone who disagrees Neanderthals... Oh, but the latter is much more fun! Anyway, the fundamental issue seems to be that there are (at least) two types of external data objects: streams -- transient data, takes special effort to capture files -- permanent data with an attached name UNIX nicely makes these appear much the same, but they do have some inherent differences, and this one-pass versus multi-pass context discussion has brought out one of them. There is nothing particularly wrong with the "tee" approach to turn a stream into a file long enough for whatever work is being done. The converse is often done; for example many of my shell scripts, after parsing arguments, exec a pipeline that starts cat $* | ... in order to ensure a stream input to the rest of the pipeline.
garyo@masscomp.UUCP (Gary Oberbrunner) (06/06/88)
The only change I've ever had to make to the source for grep to make it do what I want was to make it work with arbitrary-length lines. I consider not handling long lines (and not complaining about them either) to be extremely antisocial. All this other stuff is just window-dressing. Not that it's bad; one integrated grep with B-M strings, alternation and inversion operators, and nifty feeping creaturism is great by me. I usually handle the multi-line-record case by tr'ing all the intermediate line ends into some unused character, doing my database hackery (grep, awk, sed, what have you) and then tr'ing back at the end. This is one reason for having grep support very long lines. As always, Gary ---------------------------------------------------------------------------- Remember, Truth is not beauty; (617)692-6200x2445 Information is not knowledge; Beauty is not love; Gary Oberbrunner Knowledge is not wisdom; Love is not music; ...!masscomp!garyo Wisdom is not truth; Music is the best. - FZ ....garyo@masscomp -- Remember, Truth is not beauty; (617)692-6200x2445 Information is not knowledge; Beauty is not love; Gary Oberbrunner Knowledge is not wisdom; Love is not music; ...!masscomp!garyo Wisdom is not truth; Music is the best. - FZ ....garyo@masscomp
bzs@bu-cs.BU.EDU (Barry Shein) (06/06/88)
From: gwyn@brl-smoke.ARPA (Doug Gwyn ) >There is nothing particularly wrong with the "tee" approach to >turn a stream into a file long enough for whatever work is being >done. The converse is often done; for example many of my shell >scripts, after parsing arguments, exec a pipeline that starts > cat $* | ... >in order to ensure a stream input to the rest of the pipeline. Nothing wrong with it unless you happen to be on a parallel machine as I am a lot of the time and pipes can run in parallel nicely. Nyah Nyah, got ya there! PHFZZZZT! I win! I win! You're right, this is getting ridiculous, we made our points... Ok everyone, back to arguing which flags should be maintained in cat and Unix Standardization AKA "West Coast Story" (snap fingers.) -Barry Shein, Boston University
nelson@sun.soe.clarkson.edu (Russ Nelson) (06/06/88)
In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me] >>alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$ >I don't understand, the way to avoid having to tee it into temp >files is to tee it into temp files? No. There is no way to avoid teeing it into a temp file. Such is life with pipes. If you want context then you need to save it. My alias is perfectly consistent with the tool-using philosophy. Yes, it's a kludge, but that's the only way to save context in a single-stream pipe philosophy. I remember reading a paper in which multiple streams going hither and yon were proposed, but the syntax was gothic at best. I like being able to say this: bsd: sort | with_context grep rfoo | more sysv: sort | with_context grep foo | more Because sysv doesn't have the r* utilities, of course :-) -- signed char *reply-to-russ(int network) { /* Why can't BITNET go */ if(network == BITNET) return "NELSON@CLUTX"; /* domainish? */ else return "nelson@clutx.clarkson.edu"; }
rick@seismo.CSS.GOV (Rick Adams) (06/07/88)
7th Edition grep had a -h flag to not print the filenames on a grep. 4BSD still has a -h flag. System 5 doesn't have a -h flag. (Another example of how System 5 is superior to BSD... and V7...) ---rick
tower@bu-cs.BU.EDU (Leonard H. Tower Jr.) (06/07/88)
In article <6866@elroy.Jpl.Nasa.Gov> alan@cogswell.Jpl.Nasa.Gov (Alan S. Mazer) writes: | |One thing I would _love_ is to be able to find the context of what I've |found, for example, to find the two (n?) surrounding lines. I have wanted |to do this many times and there is no good way. GNU Emacs has a command that will walk you through each match of a grep run and show you the context around it: grep: Run grep, with user-specified args, and collect output in a buffer. While grep runs asynchronously, you can use the C-x ` command to find the text that grep hits refer to. M-x grep RET to invoke it. I suspect other Unix Emacs have a similar feature. Information on how to obtain GNU Emacs, other GNU software, or the GNU project itself is available from: gnu@prep.ai.mit.edu enjoy -len
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/07/88)
In article <44366@beno.seismo.CSS.GOV> rick@seismo.CSS.GOV (Rick Adams) writes: >7th Edition grep had a -h flag to not print the filenames on a grep. >4BSD still has a -h flag. >System 5 doesn't have a -h flag. >(Another example of how System 5 is superior to BSD... and V7...) Maybe the AT&T folks figured that their customers were smart enough to type "cat files ... | grep". I've never had the need for a -h flag, but I sure would like for the -H (ALWAYS print filename) option to be the default instead of the current variable algorithm.
brianc@cognos.uucp (Brian Campbell) (06/07/88)
In article <4524@vdsvax.steinmetz.ge.com> Bruce G. Barnett writes: > There have been times when I wanted a grep that would print out the > first occurrence and then stop. In article <1036@cfa.cfa.harvard.EDU> Bill Wyatt suggests: > grep '(your_pattern_here)' | head -1 In article <4537@vdsvax.steinmetz.ge.com> Bruce G. Barnett replies: > There are times when I want the first occurrence of a pattern without > reading the entire (i.e. HUGE) file. If we're talking about finding subject lines in news articles: head -20 file1 file2 ... | grep ^Subject: > Or there are times when I want the first occurrence of a pattern from > hundreds of files, but I don't want to see the pattern more than once. In this case, the original suggestion seems appropriate: grep pattern file1 file2 ... | head -1 -- Brian Campbell uucp: decvax!utzoo!dciem!nrcaer!cognos!brianc Cognos Incorporated mail: POB 9707, 3755 Riverside Drive, Ottawa, K1G 3Z4 (613) 738-1440 fido: (613) 731-2945 300/1200/2400, sysop@1:163/8
guy@gorodish.Sun.COM (Guy Harris) (06/08/88)
> >7th Edition grep had a -h flag to not print the filenames on a grep. > >4BSD still has a -h flag. > >System 5 doesn't have a -h flag. > >(Another example of how System 5 is superior to BSD... and V7...) > > Maybe the AT&T folks figured that their customers were smart enough > to type "cat files ... | grep". *Which* "AT&T folks"? The folks at AT&T Bell Labs Research were the ones who put the "-h" flag into "grep" in the first place, *not* the ones at Berkeley. > I've never had the need for a -h flag, but I sure would like for the -H > (ALWAYS print filename) option to be the default instead of the current > variable algorithm. Maybe the AT&T folks figured that their customers were smart enough to type "grep ... /dev/null"?
oz@yunexus.UUCP (Ozan Yigit) (06/08/88)
In article <7939@alice.UUCP> andrew@alice.UUCP writes: > >many people have written about patterns matching multiple lines. >grep will not do this. if you really need this, use sam by rob pike >as described in the nov 1987 software practice and experience. > Why should this not be done by grep ??? I think Rob Pike's "Structured Expressions" is the way to go for a modern grep, where newline spanning is supported, and the program does not die unexpectedly just because a file contains a line too long for a stupid internal "line size". (For an insightful discussion of this, interested readers could check out Rob's paper in EUUG proceedings.) oz -- The deathstar rotated slowly, | Usenet: ...!utzoo!yunexus!oz towards its target, and sparked | ....!uunet!mnetor!yunexus!oz an intense sunbeam. The green world | Bitnet: oz@[yulibra|yuyetti] of unics evaporated instantly... | Phonet: +1 416 736-5257x3976
guy@gorodish.Sun.COM (Guy Harris) (06/09/88)
> No, the obvious thing to do is: > > nm -o _memcpy *.o "Obvious" under which version of UNIX? From the 4.3BSD manual: -o Prepend file or archive element name to each output line rather than only once. The SunOS manual page says the same thing. From the S5R3 manual: -o Print the value and size of a symbol in octal instead of decimal. With the 4.3BSD version you can do nm -o *.o | egrep _memcpy and get the result you want. For any version of "nm" that I know of, you can do the "egrep" trick mentioned in another posting; you may have to use a flag such as "-p" with the S5 version to get "easily parsable, terse output."
john@frog.UUCP (John Woods) (06/09/88)
Hypothesize for the moment that I would like to have the Subject: lines for each article in /usr/spool/news/comp/sources/unix. Many people have proposed a new flag for the "new grep" (one that functions just like the -one flag does on "match", the matching program I use (a flag I implemented long ago)). In article<5007@sdcsvax.UCSD.EDU>,hutch@net1.ucsd.edu(Jim Hutchison) suggests: > grep pattern file1 ... fileN | head -1 > This should send grep a SIGPIPE as soon as the first line of output > trickles through the pipe. This would result in relatively little > of the file actually being read under most Unix implementations. Yes, it would result in relatively little of the file being read. It would also result in relatively little of the desired output. Check the problem space before posting solutions, folks. As I pointed out in another message, you can get awk to solve the problem almost exactly, with some irregularity in the NFILES={0,1} cases. However, the "tool-using" approach is a two-edged sword, it seems to me: a matching problem should be solvable by using the matching tool, not by a special case of an editor tool (the purported "sed" solution) or by having to reach for a full-blown programming language (awk); just as one should not paginate a text file by using the /PAGINATE /NOPRINT features of a line-printer program... Sometimes you need to EN-feature a program in order to avoid having to turn to (other) inappropriate tools. "Oh, you can't ADD text with this editor, only change existing text. You add text by using 'cat >> filename' ..." I like the "context" tool suggested elsewhere, but it has one problem (as stated) for replacing context diffs: context diffs are both context and _differences_, and are generally clearly marked as such (i.e., the !+- convention); while I guess you could turn an ed-script style diff listing into a context diff (given both input files and the diff marks), that is a radically different input language than that proposed for eliminating context grep. This just means, however, that two context tools are needed, not just one. To paraphrase Einstein, "Programs should be as simple as possible, and no simpler." -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu No amount of "Scotch-Guard" can repel the ugly stains left by REALITY... - Griffy
john@frog.UUCP (John Woods) (06/09/88)
In article <1998@u1100a.UUCP>, krohn@u1100a.UUCP (Eric Krohn) writes: > In article <1112@X.UUCP> john@frog.UUCP (some clown :-) writes: > ] awk '/^Subject: / { print FILENAME ":" $0; next }' * > > This will print Subject: lines more than once per file if a file happens to > have more than one Subject: line. `Next' goes to the next input line, not > the next input file, so you are still left with an exhaustive search of all > the files. > Oops. I blew it. Working on GNU awk seems to have permanently damaged my brain (there are a couple of differences between "real" awk and GNU awk which I couldn't convince the author were worth changing, specifically in 'exit' (not next); GNU exit actually does what I thought next would do, instead of exiting entirely). -- John Woods, Charles River Data Systems, Framingham MA, (617) 626-1101 ...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu No amount of "Scotch-Guard" can repel the ugly stains left by REALITY... - Griffy
maujd@warwick.UUCP (Geoff Rimmer) (06/10/88)
In article <2450011@hpsal2.HP.COM> morrell@hpsal2.HP.COM (Michael Morrell) writes: >grep -n does this, but I'd like to see an option which ONLY prints the line >numbers where the pattern was found. How about grep -n pattern file | sed "s/:.*//" ? ------------------------------------------------------------ Geoff Rimmer, Computer Science, Warwick University, UK. maujd@uk.ac.warwick.opal ------------------------------------------------------------
vanam@pttesac.UUCP (Marnix van Ammers) (06/10/88)
In article <4524@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes: >There have been times when I wanted a grep that would print out the >first occurrence and then stop. sed -n -e "/<pattern>/ { p" -e q -e "}"
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > In article <7944@alice.UUCP> andrew@alice.UUCP writes: >> the right thing to do is to write a context program that takes input >> looking like "filename:linenumber:goo" and prints whatever context ... > Heavens -- a tool user. I thought that only Neanderthals were still > alive. I guess Bell Labs escaped the plague. A real useful `tool', this, that works only on files. And only when you grep more than one file, so you get filenames (or happen to be able to remember which flag it is to make grep print filenames always, assuming of course that your grep has it). Besides, grep has the context, or could have if it wanted to bother saving it. Why read all two hundred thousand lines of the file *again*? Wasn't it bad enough the first time? der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <1030@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes: > In article <23133@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >> In article <7944@alice.UUCP> andrew@alice.UUCP writes: >>> the right thing to do is to write a context program that takes >>> input looking like "filename:linenumber:goo" and prints whatever >>> context ... >> Almost, unless the original input was produced by a pipeline, [...] >> unless you tee the mess to a temp file, yup, mess is the right word. > How about: > alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$ This assumes that (a) there's room on /tmp to save the whole thing and (b) that you don't mind rereading it all to find the appropriate line. Both assumptions are commonly violated, in my experience. der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <8022@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > Or, getting back to context-grep, what good would it do to show > context from a pipe? To do anything with the information (other than > stare at it), you'd need to produce it again. Why do we have diff -c? Generally, to stare at. (The only other use I know of is producing diffs for Larry Wall's patch program.) der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <5007@sdcsvax.UCSD.EDU>, hutch@net1.ucsd.edu (Jim Hutchison) writes: > 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) >> In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: [attribution(s) lost] >>>> There have been times when I wanted a grep that would print out >>>> the first occurrence and then stop. >>> grep '(your_pattern_here)' | head -1 >> Have you ever waited for a computer? There are times when I want >> the first occurrence of a pattern without reading the [whole file]. > grep pattern file1 ... fileN | head -1 > This should send grep a SIGPIPE as soon as the first line of output > trickles through the pipe. No. It should not send the SIGPIPE until grep writes the second line. And because grep is likely to use stdio for its output, nothing at all may be written to the pipe until grep has 1K or 2K or whatever size its stdio uses for the output buffer. This may be an enormous waste of time, both cpu and real. Besides which, it's wrong. It prints just the first match, whereas what's wanted is the first match *from each file*. der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <7207@watdragon.waterloo.edu>, tbray@watsol.waterloo.edu (Tim Bray) writes: > Scenario - you're trying to link 55 .o's together to build a program > you don't know that well. You're on berklix. ld sez: "undefined: > _memcpy". You say: "who's doing that?". The source is scattered > inconveniently. The obvious thing to do is: grep -l _memcpy *.o Doesn't anybody read the man pages any more? The obvious thing is to use the supplied facility: the -y option to ld. % cc -o program *.o -y_memcpy Undefined: _memcpy buildstruct.o: reference to external undefined _memcpy copytree.o: reference to external undefined _memcpy % (I don't know how generally available this is. You did say "berklix", and I know this is in 4.3, but I don't know about other Berklices.) der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
mouse@mcgill-vision.UUCP (der Mouse) (06/10/88)
In article <1037@sun.soe.clarkson.edu>, nelson@sun.soe.clarkson.edu (Russ Nelson) writes: > In article <23143@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes: >> From: nelson@sun.soe.clarkson.edu (Russ Nelson) [responding to me] >>> alias with_context tee >/tmp/$$ | $* | context -f/tmp/$$ >> I don't understand, the way to avoid having to tee it into temp >> files is to tee it into temp files? > No. There is no way to avoid teeing it into a temp file. Sure there is. > If you want context then you need to save it. True. But you don't necessarily need to save it in a file. > [the alias above is] the only way to save context in a single-stream > pipe philosophy. Grep can save it in memory. Unless you want so much context that it overflows the available memory, which I find difficult to see happening, this is a perfectly good place to put it. In fact, I wrote a grep variant which starts by snarfing the whole file into (virtual) memory. Makes for extreme speed when it's usable, which is often enough to make it worthwhile (for me, at least). And of course it means that I could get as much context as I cared to. (I've never had it fail because it couldn't get enough memory to hold the whole file.) der Mouse uucp: mouse@mcgill-vision.uucp arpa: mouse@larry.mcrcim.mcgill.edu
andrew@alice.UUCP (06/11/88)
The following is a summary of the somewhat plausible ideas suggested for the new grep. I thank leo de witt particularly and others for clearing up misconceptions and pointing out (correctly) that existing tools like sed already do (or at least nearly do) what some people asked for. The following points are in no particular order and no slight is intended by my presentation. After that, I summarise the current flags. 1) named character classes, e.g. \alpha, \digit. i think this is a hokey idea and dismissed it as unnecessary crud but then found out it is part of the proposed regular expression stuff for posix. it may creep in but i hope not. 2) matching multi-line patterns (\n as part of pattern) this actually requires a lot of infrastructure support and thought. i prefer to leave that to other more powerful programs such as sam. 3) print lines with context. the second most requested feature but i'm not doing it. this is just the job for sed. to be consistent, we just took the context crap out of diff too. this is actually reasonable; showing context is the job for a separate tool (pipeline difficulties apart). 4) print one(first matching) line and go onto the next file. most of the justification for this seemed to be scanning mail and/or netnews articles for the subject line; neither of which gets any sympathy from me. but it is easy to do and doesn't add an option; we add a new option (say -1) and remove -s. -1 is just like -s except it prints the matching line. then the old grep -s pattern is now grep -1 pattern > /dev/null and within epsilon of being as efficent. 5) divert matching lines onto one fd, nonmatching onto another. sorry, run grep twice. 6) print the Nth occurence of the pattern (N is number or list). it may be possible to think of a real reason for this (i couldn't) but the answer is no. 7) -w (pattern matches only words) the most requested feature. well, it turns out that -x (exact) is there because doug mcilroy wanted to match words against a dictionary. it seems to have no other use. Therefore, -x is being dropped (after all, it only costs a quick edit to do it yourself) and is replaced by -w == (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]). 8) grep should work on binary files and kanji. that it should work on kanji or any character set is a given (at least, any character set supported by the system V international character set stuff). binary files will work too modulo the following restraint: lines (between \n's) have to fit in a buffer (current size 64K). violations are an error (exit 2). 9) -b has bogus units. agreed. -b now is in bytes. 10) -B (add an ^ to the front of the given pattern, analogous to -x and -w) -x (and -w) is enough. sorry. 11) recursively descend through argument lists no. find | xargs is going to have to do. 12) read filenames on standard input no. xargs will have to do. 13) should be as fast as bm. no worries. in fact, our egrep is 3xfaster than bm. i intend to be competitive with woods' egrep. it should also be as fast as fgrep for multiple keywords. the new grep incorporates boyer-moore as a degenerate case of Commentz-Walter, a faster replacement for the fgrep algorithm. 14) -lv (files that don't have any matching lines) -lv means print names of files that have any nonmatching lines (useful, say, for checking input syntax). -L will mean print names of files without selected lines. 15) print the part of the line that matched. no. that is available at the subroutine level. 16) compatability with old grep/fgrep/egrep. the current name for the new command is gre (aho chose it). after a while, it will become our grep. there will be a -G flag to take patterns a la old grep and a -F to take patterns a la fgrep (that is, no metacharacters except \n == |). gre is close enough to egrep to not matter. 17) fewer limits. so far, gre will have only one limit, a line length of 64K. (NO, i am not supporting arbitrary length lines (yet)!) we forsee no need for any other limit. for example, the current gre acts like fgrep. it is 4 times faster than fgrep and has no limits; we can gre -f /usr/dict/words (72K words, 600KB). 18) recognise file types (ignore binaries, unpack packed files etc). get real. go back to your macintosh or pyramid. gre will just grep files, not understand them. 19) handle patterns occurring multiple times per line this is illdefined (how many time does aaaa occur in a line of 20 'a's? in order of decreasing correctness, the answers are >=1, 17, 5). For the cases people mentioned (words), pipe it thru tr to put the words one per line. 20) why use \{\} instead of \(\)? this is not yet resolved (mcilroy&ritchie vs aho&pike&me). grouping is an orthogonal issue to subexpressions so why use the same parentheses? the latest suggestion (by ritchie) is to allow both \(\) and \{\} as grouping operators but the \3 would only count one type (say \(\)). this would be much better for complicated patterns with much grouping. 21) subroutine versions of the pattern matching stuff. in a deep sense, the new grep will have no pattern matching code in it. all the pattern matching code will be in libc with a uniform interface. the boyer-moore and commentz-walter routines have been done. the other two are egrep and back-referencing egrep. lastly, regexp will be reimplemented. 22) support a filename of - to mean standard input. a unix without /dev/stdin is largely bogus but as a sop to the poor barstards having to work on BSD, gre will support - as stdin (at least for a while). Thus, the current proposal is the following flags. it would take a GOOD argument to change my mind on this list (unless it is to get rid of a flag). -f file pattern is (`cat file`) -v nonmatching lines are 'selected' -i ignore aphabetic case -n print line number -c print count of selected lines only -l print filenames which have a selected line -L print filenames who do not have a selected line -b print byte offset of line begin -h do not print filenames in front of matching lines -H always print filenames in front of matching lines -w pattern is (^|[^_a-zA-Z0-9])pattern($|[^_a-zA-Z0-9]) -1 print only first selected line per file -e expr use expr as the pattern Andrew Hume research!andrew
wswietse@eutrc3.UUCP (Wietse Venema) (06/11/88)
In article <7207@watdragon.waterloo.edu> tbray@watsol.waterloo.edu (Tim Bray) writes: }Grep should, where reasonable, not be bound by the notion of a 'line'. }As a concrete expression of this, the useful grep -l (prints the names of }the files that contain the string) should work on any kind of file. More }than one existing 'grep -l' will fail, for example, to tell you which of a }bunch of .o files contain a given string. Scenario - you're trying to }link 55 .o's together to build a program you don't know that well. You're }on berklix. ld sez: "undefined: _memcpy". You say: "who's doing that?". }The source is scattered inconveniently. The obvious thing to do is: }grep -l _memcpy *.o }That this often will not work is irritating. }Tim Bray, New Oxford English Dictionary Project, U of Waterloo nm -op *.o | grep memcpy will work just fine, both with bsd and att unix. Wietse -- uucp: mcvax!eutrc3!wswietse | Eindhoven University of Technology bitnet: wswietse@heithe5 | Dept. of Mathematics and Computer Science surf: tuerc5::wswietse | Eindhoven, The Netherlands.
randy@umn-cs.cs.umn.edu (Randy Orrison) (06/12/88)
In article <7962@alice.UUCP> andrew@alice.UUCP writes: |3) print lines with context. | the second most requested feature but i'm not doing it. this is | just the job for sed. to be consistent, we just took the context ^^^^^^^^^^^^^^^^ | crap out of diff too. this is actually reasonable; showing context ^^^^^^^^^^^^^^^^ | is the job for a separate tool (pipeline difficulties apart). What?!?!? Ok, i would like context in grep, but i'll live without it. Context diffs, however are a different matter. There isn't an easy way to generate them with diff/context (the first character of every line is produced as part of the diff). Context diffs are useful for patches, and having a tool to generate them is necessary. They're a logical improvement to diff that is more than just context around the changes. If you're fixing grep fine, but don't break diff while you're at it. -randy -- Randy Orrison, Control Data, Arden Hills, MN randy@ux.acss.umn.edu 8-(OSF/Mumblix: Just say NO!)-8 {ihnp4, seismo!rutgers, sun}!umn-cs!randy "I consulted all the sages I could find in Yellow Pages, but there aren't many of them." -APP
allbery@ncoast.UUCP (Brandon S. Allbery) (06/13/88)
As quoted from <7944@alice.UUCP> by andrew@alice.UUCP: +--------------- | the right thing to do is to write a context program that takes | input looking like "filename:linenumber:goo" and prints whatever context you like. | we can then take this crap out of grep and diff and make it generally available | for use with programs like the C compiler and eqn and so on. It can also do | the right thing with folding together nearby lines. At least one good first | cut has been put on the net but a C program sounds easy enough to do. +--------------- A C version has been done; it handles pcc, grep -n, and cpp messages. I posted it 2 1/2 years ago. It does *not* handle diff, since diff's messages are slightly different and lack filename information; also, since it passes lines it doesn't understand you'd end up with both regular and context diffs in the same output. Now if diff had an option to output in the format <filename>:<lineno>[-<lineno>]:<action> we'd be all set -- I could modify it to handle ranges easily. (Changes would be output as "file1:n-m:file was\nfile2:n-m:now is", or something similar.) Note that it'd be nice if lint output messages this way as well. I have a postprocessor for lint which does this -- even with System V's lint that can have lint1 and lint2 run separately via .ln files. -- Brandon S. Allbery | "Given its constituency, the only uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about Delphi: ALLBERY MCI Mail: BALLBERY | [the Open Software Foundation] is comp.sources.misc: ncoast!sources-misc | its mouth." --John Gilmore
keith@seismo.CSS.GOV (Keith Bostic) (06/14/88)
In article <7962@alice.UUCP>, andrew@alice.UUCP writes: > 22) support a filename of - to mean standard input. > a unix without /dev/stdin is largely bogus but as a sop to the poor > barstards having to work on BSD, gre will support - > as stdin (at least for a while). > > Andrew Hume > research!andrew A few comments: -- As far I'm aware, V9 is the only system that has "/dev/stdin" at the moment. For those who haven't heard of it, V9 is a research version of UN*X developed and in use at the Computing Science Research Center, a part of AT&T Bell Laboratories, and available to a small number of universities. It was preceded by V8, which, interestingly enough, was built on top of 4.1BSD. -- System V does not suppport "/dev/stdin". -- The next full release of BSD will contain "/dev/stdin" and friends. It is not part of the 4.3-tahoe release because it requires changes to stdio. I do not expect, however, commands that currently support the "-" syntax to change, for compatibility reasons. V9 itself continues to support such commands. To sum up, let's try and keep this, if not actually constructive, at least bearing some distant relationship to the facts. Keith Bostic
allbery@ncoast.UUCP (Brandon S. Allbery) (06/14/88)
As quoted from <5007@sdcsvax.UCSD.EDU> by hutch@net1.ucsd.edu (Jim Hutchison): +--------------- | 4537@vdsvax.steinmetz.ge.com, barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) | >In <1036@cfa.cfa.harvard.EDU> wyatt@cfa.harvard.EDU (Bill Wyatt) writes: | >|> There have been times when I wanted a grep that would print out the | >|> first occurrence and then stop. | >| | >|grep '(your_pattern_here)' | head -1 | > | >There are times when I want the first occurrence of a pattern without | >reading the entire (i.e. HUGE) file. | | I realize this is dependent on the way in which processes sharing a | pipe act, but this is a point worth considering before we get yet | another annoying burst of "cat -v" type programs. | | grep pattern file1 ... fileN | head -1 | | This should send grep a SIGPIPE as soon as the first line of output | trickles through the pipe. This would result in relatively little | of the file actually being read under most Unix implementations. +--------------- Not true. The SIGPIPE is sent when "grep" writes the second line, *not* when "head" exits! If there *is* only one line containing the pattern, grep will happily read all of the (possibly large) files without getting SIGPIPE. This is not pleasant, even if it's only one large file -- say a comp.sources.unix posting which you're grepping for a Subject: line. -- Brandon S. Allbery | "Given its constituency, the only uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about Delphi: ALLBERY MCI Mail: BALLBERY | [the Open Software Foundation] is comp.sources.misc: ncoast!sources-misc | its mouth." --John Gilmore
andrew@frip.gwd.tek.com (Andrew Klossner) (06/14/88)
[] "so far, gre will have only one limit, a line length of 64K. (NO, i am not supporting arbitrary length lines (yet)!)" Why not a flag to let the user specify the max line length? Just the thing for that database hacker, and diminishes the demand for arbitrary length. "there will be a -G flag to take patterns a la old grep and a -F to take patterns a la fgrep" I hope that -F is a permanent, not temporary, flag. I don't see it in the summary list of supported flags, shudder. "a unix without /dev/stdin is largely bogus but as a sop to the poor barstards having to work on BSD, gre will support - as stdin (at least for a while)." It's not just BSD; I haven't seen /dev/stdin in any released edition. I just looked over the sVr3.1 tape and didn't turn up anything. -=- Andrew Klossner (decvax!tektronix!tekecs!andrew) [UUCP] (andrew%tekecs.tek.com@relay.cs.net) [ARPA]
chris@mimsy.UUCP (Chris Torek) (06/14/88)
In article <44370@beno.seismo.CSS.GOV> keith@seismo.CSS.GOV [at seismo?!?] (Keith Bostic) writes: > -- The next full release of BSD will contain "/dev/stdin" and friends. > It is not part of the 4.3-tahoe release because it requires changes > to stdio. Well, only because freopen("/dev/stdin", "r", stdin) unexpectedly fails: it closes fd 0 before attempting to open /dev/stdin, which means that stdin is gone before it can grab it again. When I `fixed' this here it broke /usr/ucb/head and I had to fix the fix! The sequence needed is messy: old = fileno(fp); new = open(...); if (new < 0) { close(old); /* maybe it was EMFILE */ new = open(...);/* (could test errno too) */ if (new < 0) return error; } if (new != old) { if (dup2(new, old) >= 0) /* move it back */ close(new); else { close(old); fileno(fp) = new; } } Not using dup2 means that freopen(stderr) might make fileno(stderr) something other than 2, which breaks at least perror(). -- In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 7163) Domain: chris@mimsy.umd.edu Path: uunet!mimsy!chris
tbray@watsol.waterloo.edu (Tim Bray) (06/14/88)
>In article <7207@watdragon.waterloo.edu> I wrote: >}Grep should, where reasonable, not be bound by the notion of a 'line'. ... >}The source is scattered inconveniently. The obvious thing to do is: >}grep -l _memcpy *.o >}That this often will not work is irritating. At least a dozen people have sent me alternate ways of doing this, the most obvious using 'nm'. Look, I KNOW ABOUT NM! But you're missing the point - suppose the item in the .o files was another type of string, e.g. an error message. The point is: There are some files. One or more may contain a string in which I am interested. grep -l is a tool which is supposed to tell me whether one or more files contain a string. The fact that it refuses to do so for a class of magic files is a gratuitous violation of the unix paradigm. Tim Bray, New Oxford English Dictionary Project, U of Waterloo
oz@yunexus.UUCP (Ozan Yigit) (06/15/88)
In article <7962@alice.UUCP> andrew@alice.UUCP writes: > >21) subroutine versions of the pattern matching stuff. > .... > .... the other two are egrep and back-referencing egrep. > lastly, regexp will be reimplemented. > >Andrew Hume Just how do you propose to implement the back-referencing trick in a properly constructed (nfa and/or nfa->dfa conversion static or on-the-fly) egrep ?? I presume that after each match of the \(reference\) portion, you would have to on-the-fly modify the \n portion of the fsa. Gack! Do you have a theoretically solid algorithm [say, within the context of Aho/Sethi/Ullman's Dragon Book chapter on regular expressions] for this ?? I would be much interested. oz -- The DeathStar rotated slowly, | Usenet: ...!utzoo!yunexus!oz towards its target, and sparked | ....!uunet!mnetor!yunexus!oz an intense SUNbeam. The green world | Bitnet: oz@[yulibra|yuyetti] of unics evaporated instantly... | Phonet: +1 416 736-5257x3976
tj@mks.UUCP (T. J. Thompson) (06/15/88)
In article <8032@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > ... but I sure would like for the -H (ALWAYS print filename) > option to be the default instead of the current variable algorithm. This option is exactly what you need when exec'ing grep from find. It is implemented as grep pattern file /dev/null -- ll // // ,'/~~\' T. J. Thompson uunet!watmath!mks!tj /ll/// //l' `\\\ Mortice Kern Systems Inc. (519) 884-2251 / l //_// ll\___/ 35 King St. N., Waterloo, Ont., Can. N2J 2W9 O_/ long time(); /* know C */
barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (06/15/88)
In article <7962@alice.UUCP> andrew@alice.UUCP writes: | | The following is a summary of the somewhat plausible ideas |suggested for the new grep. |4) print one(first matching) line and go onto the next file. | most of the justification for this seemed to be scanning | mail and/or netnews articles for the subject line; neither | of which gets any sympathy from me. but it is easy to do | and doesn't add an option; we add a new option (say -1) | and remove -s. -1 is just like -s except it prints the matching line. | then the old grep -s pattern is now grep -1 pattern > /dev/null | and within epsilon of being as efficent. ----------- Actually this is extremely wrong. Given the command grep -1 Subject /usr/spool/news/comp/sources/unix/* >/dev/null and grep -s Subject /usr/spool/news/comp/sources/unix/* >/dev/null I would expect the first one to read *every* file. The second case ( -s ) should terminate as soon as it finds the first match in the first file. Unless I misunderstand the functionality of the -s command. -- Bruce G. Barnett <barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP> uunet!steinmetz!barnett
guy@gorodish.Sun.COM (Guy Harris) (06/16/88)
> grep -l is a tool which is supposed to tell me whether one or more files > contain a string. No, it isn't. "grep -l" is a tool that is supposed to tell you whether one or more *text* files contain a string; if your file doesn't happen to contain newlines at least every N characters or so, too bad. If you want to improve this situation by writing a "grep" that doesn't have this restriction, feel free. > The fact that it refuses to do so for a class of magic files is a > gratuitous violation of the unix paradigm. "ed is a tool that is supposed to let me modify files. The fact that it refuses to do so for a class of magic files is a gratuitous violation of the unix paradigm." Sorry, but the fact that you can't normally use "ed" to patch binaries doesn't bother me one bit.
ljz@fxgrp.UUCP (Lloyd Zusman) (06/16/88)
In article <7962@alice.UUCP> andrew@alice.UUCP writes:
The following is a summary of the somewhat plausible ideas
suggested for the new grep. ...
...
2) matching multi-line patterns (\n as part of pattern)
this actually requires a lot of infrastructure support and thought.
i prefer to leave that to other more powerful programs such as sam.
^^^
...
Since I'm one of the people who suggested the ability to match multi-line
patterns, I'm a bit disappointed about this ... but such is life. So
where can I find 'sam'? Is it in the public domain? Is source code
available?
You can try to reply via email ... it might actually work, but don't
be surprised if your mail bounces, in which case I'd appreciate
replies here.
Thanks in advance.
--
Lloyd Zusman UUCP: ...!ames!fxgrp!ljz
Master Byte Software Internet: ljz%fx.com@ames.arc.nasa.gov
Los Gatos, California or try: fxgrp!ljz@ames.arc.nasa.gov
"We take things well in hand."
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/16/88)
In article <698@fxgrp.UUCP> ljz%fx.com@ames.arc.nasa.gov (Lloyd Zusman) writes: >where can I find 'sam'? Is it in the public domain? Is source code >available? So far as I know, if you aren't part of AT&T and don't have 9th Edition UNIX, the only way to legally obtain "sam" is to acquire it from the AT&T UNIX System ToolChest, where it is included in the "dmd-pgmg" package. This is definitely not public domain, but it's inexpensively priced and it does include source code. "sam" works either with dumb terminals or with a smart one like an AT&T Teletype 5620 or 630. I haven't tried installing it without DMD support but obviously it can be done. I use "sam" (DMD version) whenever I have serious editing to do.
fmr@cwi.nl (Frank Rahmani) (06/16/88)
> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792 > Posted: Fri Jun 10 05:29:43 1988 > > In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: > A real useful `tool', this, that works only on files. And only when > you grep more than one file, so you get filenames (or happen to be able > to remember which flag it is to make grep print filenames always, > assuming of course that your grep has it). ... ... that's the smallest of all problems, just include /dev/null as first file to be searched into your script like grep [options] pattern /dev/null one_or_more_filenames by the way I like the sed one-liner that was posted as answer to the grep replacement question. Why couldn't I think of it?:-) fmr@cwi.nl -- It is better never to have been born. But who among us has such luck? -------------------------------------------------------------------------- These opinions are solely mine and in no way reflect those of my employer.
daveb@geac.UUCP (David Collier-Brown) (06/17/88)
In article <10078@tekecs.TEK.COM> andrew@frip.gwd.tek.com (Andrew Klossner) quotes someone to say: >[] > > "so far, gre will have only one limit, a line length of 64K. > (NO, i am not supporting arbitrary length lines (yet)!)" Well, arbitrary line lengths are easy. Initially allocate a cache When reading fgets a cache-full if the last character is not a \n increase the cache with realloc read some more A function to do this, called getline, was published recently in the source groups. --dave (remember my old .signature?) c-b -- David Collier-Brown. {mnetor yunexus utgpu}!geac!daveb Geac Computers Ltd., | "His Majesty made you a major 350 Steelcase Road, | because he believed you would Markham, Ontario. | know when not to obey his orders"
wolfe@pdnbah.uucp (Mike Wolfe) (06/17/88)
In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes: >> Xref: mcvax comp.unix.wizards:8598 comp.unix.questions:6792 >> Posted: Fri Jun 10 05:29:43 1988 >> >> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: >> A real useful `tool', this, that works only on files. And only when >> you grep more than one file, so you get filenames (or happen to be able >> to remember which flag it is to make grep print filenames always, >> assuming of course that your grep has it). >... >... >that's the smallest of all problems, just include /dev/null as first >file to be searched >into your script like >grep [options] pattern /dev/null one_or_more_filenames Smallest of all problems? One of my pet peeves is the fact that certain commands will only print filenames if you give it more than one file. While the /dev/null ugliness is a suitable kludge for the grep case what about a case were you want to run something using xargs, something like sum. You don't want /dev/null repeated for each call. I know I can sed it out but that's just a kludge for a kludge and to me that's a red flag. I think that all commands of that type should allow you to force the filenames in output. I don't want to go back and change all the commands (UNIX++ a modest proposal ;-). I just wish people would keep this in mind when writing things in the future. ---- Mike Wolfe Paradyne Corporation, Mail stop LF-207 DOMAIN wolfe@pdn.UUCP PO Box 2826, 8550 Ulmerton Road UUCP ...!uunet!pdn!wolfe Largo, FL 34649-2826 PHONE (813) 530-8361
gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/18/88)
In article <540@sering.cwi.nl> fmr@cwi.nl (Frank Rahmani) writes: >> In article <8012@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes: But I didn't. (I think it was BZS.) PLEASE, check your attributions!
maart@cs.vu.nl (Maarten Litmaath) (06/18/88)
In article <7962@alice.UUCP> andrew@alice.UUCP writes:
\...
\5) divert matching lines onto one fd, nonmatching onto another.
\ sorry, run grep twice.
Come on! The diversion is no problem at all to implement, and it can be very
useful (you cannot run grep twice on stdin, without use of temporary files).
Regards.
--
South-Africa: |Maarten Litmaath @ Free U Amsterdam:
revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart
allbery@ncoast.UUCP (Brandon S. Allbery) (06/19/88)
As quoted from <5826@umn-cs.cs.umn.edu> by randy@umn-cs.cs.umn.edu (Randy Orrison): +--------------- | In article <7962@alice.UUCP> andrew@alice.UUCP writes: | |3) print lines with context. | | the second most requested feature but i'm not doing it. this is | | just the job for sed. to be consistent, we just took the context | ^^^^^^^^^^^^^^^^ | | crap out of diff too. this is actually reasonable; showing context | ^^^^^^^^^^^^^^^^ | | is the job for a separate tool (pipeline difficulties apart). | | | What?!?!? Ok, i would like context in grep, but i'll live without it. | Context diffs, however are a different matter. There isn't an easy way | to generate them with diff/context (the first character of every line is | produced as part of the diff). Context diffs are useful for patches, and +--------------- Yes, there is; change diff's output format slightly and expand "context" slightly, then other programs can also output in "extended context" format so as to use "context"'s facilities. I've already described part of this change in another posting; the other part would be to recognize a special indicator (on the line number, perhaps?) which would for generality be the flag to use on the difference, defaulting to "*" which is what "context" currently uses, or diff could specify "+", "-", or "!". The only other change would be to smarten "context" so that it "collapses" context "windows" together much like the 4.3BSD diff -c does. It appears that Bell Labs continues to use tools unrepentantly. It should be noted that they *are* into research, so I have no arguments against their use of /dev/stdin (/dev/fd/0?), their assumption that there's plenty of space so stash away a copy of a file with "tee" for later use in "context", etc. (My /dev/stdin complaint earlier was not aimed at the Bell Labs folks, it was aimed at the person who informed the entire Usenet that "hey, I posted a /dev/stdin driver source for 4.2BSD, so not a one of you has any reason not to be running it". In other words, the usual 4.xBSD-source elitism.) -- Brandon S. Allbery | "Given its constituency, the only uunet!marque,sun!mandrill}!ncoast!allbery | thing I expect to be "open" about Delphi: ALLBERY MCI Mail: BALLBERY | [the Open Software Foundation] is comp.sources.misc: ncoast!sources-misc | its mouth." --John Gilmore
frei@rubmez.UUCP (Matthias Frei ) (06/20/88)
In article <7962@alice.UUCP>, andrew@alice.UUCP writes: > > The following is a summary of the somewhat plausible ideas You are backbiting nearly all of the good ideas, posted by many users at the Net. So, why did you post your questionable request, if you only want to do some minor changes to grep ??? Please don't waste our time with things like that. Matthias Frei -------------------------------------------------------------------- Snail-mail: | E-Mail address: Microelectronics Center | UUCP frei@rubmez.uucp University of Bochum | (...uunet!unido!rubmez!frei) 4630 Bochum 1, P.O.-Box 102143 | West Germany |
greywolf@unicom.UUCP (greywolf) (06/25/88)
In article <1304@ark.cs.vu.nl> maart@cs.vu.nl (Maarten Litmaath) writes: # In article <7962@alice.UUCP> andrew@alice.UUCP writes: # \... # \5) divert matching lines onto one fd, nonmatching onto another. # \ sorry, run grep twice. # # Come on! The diversion is no problem at all to implement, and it can be very # useful (you cannot run grep twice on stdin, without use of temporary files). # Regards. Essentially, I think with respect to the tool -flag concept, their attitude there is "See figure 1." This is ESPECIALLY true when they have the opportunity to say "NIH"! (Sounds like the knights from Monty Python: The Search for the Holy Grail). For those of you who do not understand "See figure 1.", I am sure that there are some people inside AT&T who would be happy to tell you. They tell me every month on my phone bill. # -- # South-Africa: |Maarten Litmaath @ Free U Amsterdam: # revival of the Third Reich |maart@cs.vu.nl, mcvax!botter!ark!maart --