haahr@phoenix.Princeton.EDU (Paul Gluckauf Haahr) (06/16/88)
there has been quite a lot of discussion following research!andrew's outline of gre (and his comment that v9 diff no longer has -c) on whether grep and diff should provide context. my two cents worth follows. context printing might not be hard with grep. it may also require grep to do some extra work that it doesn't do right now, that is, figuring out the previous n lines. from andrew's description of the one limit that exists in gre (64k lines), and some knowledge of how one does a boyer-moore style grep (find the pattern then backup to find the beginning of the line rather than searching each line individually) leads me to believe that getting context right, at least in the presence of very long lines, isn't as easy as one would expect. on context, by the way, i can't think of a simple definition. if i am printing a context of 2 (2 lines before and after, plus the given line), what should a context grep (assume each character is a line) for X in the file "abcXdXefgXhi" print? "bcXdX XdXef fgXhi" or "bcXdXefgXhi"? and should the matched line be flagged, ala diff -c? i like having this in a separate program which could handle those questions as options. for handling pipe, yes it does mean putting the input somewhere. is that so awful? remember, context grep output would largely be for humans, and in this case i don't see the harm in using /tmp (besides, if it's small, it will all be in the buffer cache anyhow). here is a quickly hacked cgrep: #! /bin/sh case $# in 0|1) echo >&2 "usage: cgrep nlines pattern [file...]"; exit 2;; esac n="$1" pattern="$2" shift; shift case $# in 0) tee /tmp/cgrep$$ | grep -n "$pattern" | context "$n" /tmp/cgrep$$ rm -f /tmp/cgrep$$ ;; *) for i do grep -n "$pattern" "$i" | context "$n" "$i" done ;; esac ideally it would do some argument processing and pass that along to grep or context and a default nlines. context is easy and left as an exercise for the reader, but several more than capable ones have been floating around over the past couple of days. anyway, this is just an outline. on the argument for whether or not diff should support context, first note that the problem of input from a pipe being lost is a chimera: all a context diff produced needs is diff output and one of the filenames (and knowledge of whether that was the first or second file from the diff command line). [ that is for conventional diff. the diff i run supports input from two pipes (by enclosing the filename in parentheses it is passed as a command to sh -c, allowing $ diff '(first command)' '(second command)' works rather nicely. i've been thinking of hacking together a shell that takes (command) as a command argument and changing it to a named pipe or /dev/fd/n and executing the command at the other end of the pipe. rm '(cat file)' might isn't what one really expects. i heard that one of the research shells, maybe it was one of korn's supported this, but never saw it in a release. and there are very few commands which this would be useful with other than diff, although awk -f (command) might be nice every once in a while). by the way, my diff doesn't have -c, and i haven't really missed it. but i don't send out patches often. ] for those of you who are complaining about writing a /tmp file, look at what diff does if it's input is from a pipe. if we had good vm implementations, diff could get away with reading everything into core. alas, we don't, and that would cause large files to make diff thrash. my personal feeling is that it is probably not harder for any reasonable diff to handle context, and there are enough programs that look for context diffs that it isn't unreasonable to keep it there. patch fudge factors sound like a good argument for keeping them around, though a diffc might provide a good place to start for playing with new, more readable forms of context diffc, allowing some new creatures to feep into context diffs (standout mode or nice output for troff, side-by-side format) on the other hand, there is very little reason that i see to put context in grep, and a good context tool would probably make us all forget that anyone had ever asked for it. but then again, i'm from the school of thought that doesn't use cat -v. [ by the way, cat -v would be useful if it gave a one to one mapping of it's output. but, if your file contains "M-" or "^", it won't. my own vis program is reversible (with an unvis) that is useful for editing binary files (ever needed to change a hard coded pathname?) though, of course, many editors handle binary files ok ] paul haahr princeton!haahr or haahr@princeton.edu