[comp.unix.questions] Tool -flag considered harmful

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (06/13/88)

I am disturbed by the growing trend by AT&T to remove as many flags as
possible from common utilities. I understand that too much baggage
can slow down a finely tuned utility used for time-consuming shell
scripts.

But when *I* have a problem I want to solve, I want to solve it as
fast as possible. 

If there is a 'variation' that I need that I don't know how to do, I
read the manual page of the program that is closest to the
functionality that I can think of. That is, if I want to look for a
particular pattern, I start with grep. If it does what I want, fine.
(Example: print the first match and exit).

But if grep doesn't do what I want, I have to hunt for a new command,
or write my own.

It is not obvious that a "stream editor" has the functionality of
grep. Even after reading the manual page, this is NOT OBVIOUS.
I also don't (always) have time to study the manual pages for hours,
trying to decipher the *real* intent of the tools.

If I want to use grep to test for a pattern, I shouldn't have to
remember that two years ago in *.wizards, article <7962@alice.UUCP>
suggested that
	grep -1 pattern >/dev/null
was the same as
	grep -s pattern

I mean, how many extra lines would supporting both options cost?
Conversely, how many scripts will break with the new set of options?

If I want to create a patch, I use diff.

If diff loses the -c (context) option, I have to be familiar with
two commands instead of one. One being diff, one being context diff.
Why should I have to know about two different commands that do the
same thing - compare two files?

Flame me if you will, but when I use these tools in an interactive
session, I don't care if 'cat -v' is slow. Or 'diff -c'. Or grep -whatever.

I just want to find out the answers as quickly as possible.

The grep on my system is one of the fast versions. If it gets a flag
it doesn't understand, it calls up the original version for compatibility.
So it executed two programs instead of one. This is still faster 
(on my own wall clock) than
	1. Searching the whatis database to find the 'right' command
	2. Reading the manual page for the 'right' command
	3. Writing a simple shell script because the manual pages don't
	   have examples.
	4. Beating my head against the wall when I realize that the
	   new command doesn't do what I wanted EITHER.	

With the use of Shared libraries, tools should improve. The idea of
one library with the same regexp package shared by all of the
utilities should do wonders for consistant tools.

I am all for progress. Just remember, that there are tools used by the
system, and tools used by humans. Maybe I am a Neanderthal, because
I have this rock here that I do everything with......               :-)
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

seibel@cgl.ucsf.edu (George Seibel) (06/14/88)

In article <4615@vdsvax.steinmetz.ge.com> barnett@steinmetz.ge.com (Bruce G. Barnett) writes:
[...]
]I am all for progress. Just remember, that there are tools used by the
]system, and tools used by humans. [...]

Well put.  When programmers at AT&T are making design decisions that may
affect the way hundreds of thousands of people interact with their computers
for years to come, I wonder who checks it out?   The wizard down the hall?
How many of you have ever tried to teach UNIX to a casual user?  How about
trying to *sell* UNIX in industrial environments where the amount of training
required to use an O/S is a major consideration?  Creeping featurism is bad,
tools are good, agreed.  But there comes a time when the tool philosophy
needs to bend a little.  Context in grep and diff is just such a case.

George Seibel, UCSF
seibel@cgl.ucsf.edu

bzs@bu-cs.BU.EDU (Barry Shein) (06/14/88)

The "tool philosphy" is not at question in this grep discussion, the
question is whether grep is the right tool to provide contexts around
pattern matches rather than resorting to some other tool later just
for this special function.

I claim putting it into grep is exactly right, the "tool philosophy"
and parsimony arguments are red herrings, or reductio ad absurdum (why
have grep echo file names when it's easy enough to do in a shell
script? etc etc.)

I've yet to hear an argument here against putting the context function
into grep other than weak bleatings of parsimony.

The argument in favor is obvious, GREP HAS THE $&^%$ CONTEXT IN ITS
LITTLE HANDS, WHY LOOK FOR IT AGAIN? That is, it's just a minor
generalization of what grep does right now, grep prints the context
already, it's simply limited to one text line.

If people think that filename:linenum is so swell why not have grep
*only* produce that under all conditions? Ridiculous, right? The
whole damn argument is ridiculous.

Another anti-filename:linenum argument is what if I want to limit the
output to LESS than a single line, like printing only the exact text
matched? &c.

I may not be exactly right here, I can see complications myself, but I
think someone has to think slightly deeper than "heavens no, not
another flag to grep" as the primary basis for software design, it's
become the Mom and Apple Pie among some circles, excuses anything.  It
provides a convenient bludgeon to use on the novitiate.

(tho I must agree some of these anti-tools arguments are appalling,
the groans of carpetbaggers who would prefer to see their past
failures replicated rather than learn why the system won, like the
arriviste who keep moaning that C isn't more like Fortran or PL/1, I
suspect some of them moaned about the distinct lack of manure in the
streets when autos became popularized.)

	-Barry Shein, Boston University

gwyn@brl-smoke.ARPA (Doug Gwyn ) (06/15/88)

In article <23325@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>Another anti-filename:linenum argument is what if I want to limit the
>output to LESS than a single line, like printing only the exact text
>matched? &c.

Hey, while we're at it, have an option to highlight the matched
pattern(s) in STANDOUT MODE (i.e. add control characters based on
getenv("TERM") or a -Tname option).  You have to admit that would
sometimes be useful, and it depends even more than context on
what grep "has it's hands on".

How far do you want to go with this?

bzs@bu-cs.BU.EDU (Barry Shein) (06/15/88)

From Doug Gwyn
>In article <23325@bu-cs.BU.EDU> bzs@bu-cs.BU.EDU (Barry Shein) writes:
>>Another anti-filename:linenum argument is what if I want to limit the
>>output to LESS than a single line, like printing only the exact text
>>matched? &c.
>
>Hey, while we're at it, have an option to highlight the matched
>pattern(s) in STANDOUT MODE (i.e. add control characters based on
>getenv("TERM") or a -Tname option).  You have to admit that would
>sometimes be useful, and it depends even more than context on
>what grep "has it's hands on".
>
>How far do you want to go with this?

Doug, you've missed the point so completely it sends a shiver down my
spine.

The point is that if grep had a reasonable context-printer added
everything I suggest would simply be doable with *that*, no options
needed, while trying to build a back-end filter would probably be the
thing demanding all the special cases since significant amounts of
information have been lost once it went out the pipe.

Besides, if the context printer were powerfully enough designed
it could print STANDOUT mode with no mods, taking literal strings
to echo should be natural enough:

	grep -P /\"^[[7m\"&/,/\"^[[0m\"/ pat file

(-P start,end printing context, default is ".,.+1", syntax similar to
ed) that is, & as in ed, \"..\" print (literally), the ANSI standout
mode string, then the matched string, then the ANSI end-standout mode
string.

Tho it would be hardly necessary (better to backquote the grep into an
xargs/echo), as was the criticism.

A general purpose print specifier in grep could be a handy tool.

	-Barry Shein, Boston University

andrew@alice.UUCP (06/16/88)

there are some technical problems with printing context, particluarly
if you want regular expreesion-defind contexts. it means you have to be
able to run the sodding regexp stuff backwards thru the text. it also
means keeping a lot of buffer which is more than a little repugnant.
despite all this, it is feasible but not trivial to add context stuff.
i won't do it though.

andrew@alice.UUCP (06/16/88)

the following clarification may be well known to some but i have
received sufficent enquiries to warrant an explanation.

firstly, -1 and -l involve exactly the same amount of work.
the only difference is that in the latter, the filename is printed and in the
former, the line is printed. In both cases, getting a hit stops the processing
for that file.

-s is potentially faster in that after we get a hit, all we have to do is
try opening the remaining file arguments (in case we have to return a 2 for
inaccessible file arguments). in practise, no one bothers to write this
separate loop and they use the same code as -l.

andrew@frip.gwd.tek.com (Andrew Klossner) (06/16/88)

[]

	"I've yet to hear an argument here against putting the context
	function into grep other than weak bleatings of parsimony."

How about "because the people writing the tool think it doesn't
belong."  This strikes me as a strong argument.  If you have a
contrasting vision of what a pattern matcher can be, implement it
yourself; don't go beating on a toolmaker who isn't responsible to you.
(I'll admit that some of the beating is appropriate because Andrew
asked for comments, but this lambasting of his philosophy is out of
line.)

I think the exciting part of Andrew's announcement about gre (NOT
grep!) is that the sophisticated pattern matching code will be
encapsulated into library routines.  The gre utility itself will just
be a wrapper.  Those of us tool builders who can make slingshots but
aren't good enough to make scalpels will be able to produce whatever we
think a pattern matching tool should be, since the hard part will be
done and packaged for us.  (Of course, none of us peons outside the
Labs will ever see this, unless they export it via the Toolkit.)

I have worked in software foundries where tool-building consisted of
library routine building, not filter process building, and I found it
to be a more powerful, more easily exploited approach to the crafting
of wonderful programs.  I like a locally-developed tool called
"paragrep" which searches (English text) paragraphs instead of lines.
It was written by a C novice, thanks to the re_comp/re_exec library
routines.

  -=- Andrew Klossner   (decvax!tektronix!tekecs!andrew)       [UUCP]
                        (andrew%tekecs.tek.com@relay.cs.net)   [ARPA]

bzs@bu-cs.BU.EDU (Barry Shein) (06/16/88)

From andrew@alice
>there are some technical problems with printing context, particluarly
>if you want regular expreesion-defind contexts. it means you have to be
>able to run the sodding regexp stuff backwards thru the text. it also
>means keeping a lot of buffer which is more than a little repugnant.

Why not look for "starts" as you search forward and (re)start
buffering from there? No need to search backwards after the match, in
fact it would be counter-productive (either you need to seek a pipe or
save everything, of course saving a seek pointer to last start on
streams that support it might be a rational optimization, but it's not
necessary if you save starts as you go.)

I don't know what "repugnant" means in any context that can be answered.

>despite all this, it is feasible but not trivial to add context stuff.
>i won't do it though.

Of course, that is your decision, I was just trying to provide helpful
comments as was requested.

	-Barry Shein, Boston University

urban@spp2.UUCP (Michael Urban) (06/16/88)

In article <10989@cgl.ucsf.EDU> seibel@hegel.mmwb.ucsf.edu.UUCP (George Seibel) writes:
>How many of you have ever tried to teach UNIX to a casual user?  How about
>trying to *sell* UNIX in industrial environments where the amount of training
>required to use an O/S is a major consideration?  Creeping featurism is bad,
>tools are good, agreed.  But there comes a time when the tool philosophy
>needs to bend a little.  Context in grep and diff is just such a case.

I have spent the last five years here doing exactly what you describe--
teaching Unix to casual/new users, fighting the "VMS/Unix" wars with
people whose definition of "user-friendly" is "English command names",
preparing training materials (DEC once bought a thousand copies
of one of my manuals for internal use), and the like.

I am no longer convinced that the tool philosophy needs to bend.  
Instead, I suggest that we give the casual users training materials
that let them confidently *think* in the traditional Unix "tool-oriented"
manner so that they can use the system more effectively.  Attempting
to add features and prepackaged shell scripts is similar to answering
a user's "how do I do xxx" question with a "cookbook" answer; the
result is that the user will be back the next week with "how do I
do xxx+1?"  I, for one, do not think that users, even the most
novice secretarial type, are (in general) too stupid to understand
that they can find out how many files are in their directory by
typing ls|wc .  The problem is that Unix does not even begin to
approach the quality of documentation that you find in systems
that are considered "user-friendly".  If Unix came with a warm and
friendly glossy color manual with lots of pictures, one that explained
FROM THE BEGINNING about the Unix "tinker-toy" approach to commands,
I can promise you that many of the perceived problems with the
ease of use of the Unix environment would go away.  
-- 
   Mike Urban
	...!trwrb!trwspp!spp2!urban 

"You're in a maze of twisty UUCP connections, all alike"

barnett@vdsvax.steinmetz.ge.com (Bruce G. Barnett) (06/16/88)

In article <7986@alice.UUCP> andrew@alice.UUCP writes:

|-s is potentially faster in that after we get a hit, all we have to do is
|try opening the remaining file arguments (in case we have to return a 2 for
|inaccessible file arguments). in practise, no one bothers to write this
                                  --------
|separate loop and they use the same code as -l.

This seems to be a major loss in efficiency.

While
  grep -s Subject /usr/spool/news/comp/sources/unix/*

might not be painful, something like

  grep -s 'Archive-name: program/part01' /usr/spool/news/comp/sources/unix/*

would be. If indeed -s used the same code as -l, then grep would
still read EVERY LINE OF EVERY FILE - except for the single file that
had the string being searched for.

I really don't see any justification for the waste of CPU cycles.
-- 
	Bruce G. Barnett 	<barnett@ge-crd.ARPA> <barnett@steinmetz.UUCP>
				uunet!steinmetz!barnett

mouse@mcgill-vision.UUCP (der Mouse) (06/23/88)

In article <8090@brl-smoke.ARPA>, gwyn@brl-smoke.ARPA (Doug Gwyn ) writes:
> Hey, while we're at it, have an option to highlight the matched
> pattern(s) in STANDOUT MODE (i.e. add control characters based on
> getenv("TERM") or a -Tname option).

To take this in an entirely different way from what you intended....

I have a program which is like cat except that certain strings
(specified on the command line) get replaced with
underscore-backspace-character sequences.  If the standard output is a
tty, it automagically pipes its output through ul to make it come out
in the appropriate standout mode for the terminal.

					der Mouse

			uucp: mouse@mcgill-vision.uucp
			arpa: mouse@larry.mcrcim.mcgill.edu