[comp.arch] UNIX mind-set -> OK, OK!

jlg@lanl.gov (Jim Giles) (01/14/91)

Ok!  Everyone's right and I'm wrong (about the _single_ issue below).

Yes, both of the shells that are bundled with versions of UNIX _do_
automatically trash (that is, 'process') the command line arguments to
expand wildcards.  Explains why I don't use the bundled command shells
much.  This is a choice that _should_ be left to the discretion of the
utility writer.

J. Giles

asun@tornado.Berkeley.EDU (a sun) (01/14/91)

In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  Explains why I don't use the bundled command shells
>much.  This is a choice that _should_ be left to the discretion of the
>utility writer.
>

Sorry, but you're wrong here also.
You'll notice (an example which was given in a previous post)
	echo "*" and 
	echo *
give different results. 

The expansion of wildcards is left to the user's discretion, which I
think is appropriate. It provides an interface that is consistent and
easily tailered to the user's needs. I suggest you read csh(1) and
sh(1) before commenting further on the inadequacies of the standard
shells.

If you really don't want any filename expansions whatsoever, just "set
noglob."

-----
a sun
(this doesn't belong in comp.arch)

pfalstad@phoenix.Princeton.EDU (Paul John Falstad) (01/14/91)

In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  Explains why I don't use the bundled command shells
>much.

What shells DO you use, may I ask?

		  This is a choice that _should_ be left to the discretion of the
>utility writer.

This forces the user to remember, for each individual command, whether
or not it processes wildcards.  It also makes the utility writer have to
worry about wildcarding in each program he writes.

Plus, what if some programmer comes up with some brilliant new
wildcard, say one that matches exactly the files you want, every time?
In UNIX he can just incorporate it into a shell, and it will work fine
with the binaries of all existing utilities.  Otherwise, he will have to
recompile all the utilities he uses with this new globbing routine,
or just use this new routine with all future programs he writes.  Then he
gets to remember which ones have the new globbing, and which ones do not.
This is not consistent.

Having once been an Amiga programmer (which handled wildcards in the way
you specify), I can say I greatly prefer the UNIX shells' approach.  It
is not the job of "cat", for example, to process command lines and scan
directories to check for matching files, any more than it is the job of
the shell to catenate files.  In the "one tool = one job" approach, the
shell handles the job of parsing and expanding command lines, and "cat" simply
spits out files without having to scan directories or call any strange
pattern matching library functions.

If you don't want the shell to do wildcard expansion for certain
commands (e.g. find, so that "find . -name *.c -print" works as expected),
you could use an alias involving "set noglob"; also, there are shells
available that let you say "noglob find", which prevents the shell from
doing wildcard expansion whenever the command "find" is involved.

--
"Uh, Air Zalagaza 612, we have engine failure and our port wing is about
to drop off.  We anticipate a crash situation at this time."
The lavatory is engaged.  From within, you hear what could be a pygmy hog
giving birth to an unusually large litter.

lrg7030@uxa.cso.uiuc.edu (Loren Rittle) (01/14/91)

jlg@lanl.gov (Jim Giles) writes:
> Yes, both of the shells that are bundled with versions of UNIX _do_
> automatically trash (that is, 'process') the command line arguments to
> expand wildcards.
Assuming you speak of sh and csh (these days you may get ksh also).
Wrong, both shells do command line processing only if you want them
too.  Sure it is on by default but, *you* have the option to turn off
this feature.

> Explains why I don't use the bundled command shells
> much.  
No, the fact that you claim they always do expansion just proves just don't
know what you are talking about.

> This is a choice that _should_ be left to the discretion of the
> utility writer.
This is your opinion, I and many people agree that the other way
(i.e. shell expansion is `correct'), so don't state it as fact
or obvious.  I personally think leaving it to each utility writer
breeds chaos.  I can produce facts to prove this.  See my signature
if you want to know what machine and OS I speak of.

The more you say, the farther your foot gets inserted into your mouth.
Every heard of stopping while you're behind?

Loren J. Rittle
--
``In short, this is the absolute coolest computer device ever invented!''
                   -Tom Denbo speaking about The VideoToaster by NewTek
``your pathetic architectures won't support it'' - Kent Paul Dolan
``Think about NewTek's VideoToaster!  Now think about the Amiga!''
Loren J. Rittle lrg7030@uxa.cso.uiuc.edu

johnl@iecc.cambridge.ma.us (John R. Levine) (01/14/91)

In article <11314@lanl.gov> you write:
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  ... This is a choice that _should_ be left to the
>discretion of the utility writer.

Ah, finally an issue of software architecture appears in this argument.
The theory that the program rather than the command interpreter should
decide which arguments should be expanded and which ones shouldn't appears
plausible on the surface.  Having used both systems that do globbing in
the shell (various forms of Unix) and systems that do globbing in the
program (TOPS-10, Twenex and MS-DOS, among others, anyone else ever use
DOS-11?)

I can report that the Unix approach is far superior for several reasons:

Consistency: The Unix approach is easy to remember, all arguments are
expanded unless quoted.  On MS-DOS, some programs expand and some don't,
and you have to remember the rules for all of them.  The treatment of
environment variables is so confused that I don't even want to think about
it.  (Even Unix suffers here, my editor expands *foo arguments but not
$foo when reading file names interactively from the terminal.)

Simplicity: If programs do globbing, now each program has to have some way
to inhibit globbing for part or all of its filename arguments, so now you
have to put quoting facilities into every program or into the filenames
themselves.  The baroque VMS name syntax is an example of where this
leads.

Effectiveness: If I were the first programmer writing "echo" I might well
say to myself there's no need to expand these arguments, they're just text
strings.  In fact, one of the most common uses of Unix echo is to expand
file name patterns, something you can't do with MS-DOS echo for exactly
this reason.

Compatibility: When the file name syntax is extended, typically because of
a new kind of file system network facility, you only have to fix a few
programs, mostly the shells, to handle the new syntax.  On MS-DOS, there
are still programs that expand wild cards but only in the current
directory.  Twenex solved this problem by making globbing a system call,
but Twenex made everything else a system call, too, so that doesn't prove
much.  The Unix approach that has a system call to retrieve directory
entries but leaves all the rest of the string munging in the application
is a fair compromise.  One hopes that shared libraries will ameliorate
this kind of problem, but one doesn't bet a lot of money on it.

Regards,
John Levine, johnl@iecc.cambridge.ma.us, {spdcc|ima|world}!iecc!johnl

henry@zoo.toronto.edu (Henry Spencer) (01/15/91)

In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  Explains why I don't use the bundled command shells
>much.  This is a choice that _should_ be left to the discretion of the
>utility writer.

Wrong.  Then -- as seen repeatedly in other operating systems -- every
command does it differently, and a lot of them don't do it at all.
-- 
If the Space Shuttle was the answer,   | Henry Spencer at U of Toronto Zoology
what was the question?                 |  henry@zoo.toronto.edu   utzoo!henry

jlg@lanl.gov (Jim Giles) (01/15/91)

From article <1991Jan14.170115.17178@Think.COM>, by barmar@think.com (Barry Margolin):
> [...]
> In the "one tool = one job" approach, I think the shell should handle the
> job of parsing command lines.  Since part of this philosophy is that the
> tool should do this one job *well*, I don't think the shell should expand
> wildcards.  It doesn't know which arguments are filenames, so it shouldn't
> blindly expand wildcards in all of them.


I usually don't post to these wild flame-fest discussions (even those
I start!) except on weekends.  But the above is the first intelligent
comment about wildcard 'globbing' that has been posted.  I just wanted
to say "Bravo!"

J. Giles

bson@ai.mit.edu (Jan Brittenson) (01/15/91)

In article <1991Jan14.170115.17178@Think.COM> 
   barmar@think.com (Barry Margolin) writes:

 > I've used several systems (Multics, ITS, TOPS-20) where the commands
 > invoked the globbing routine, ... Filename arguments that don't make
 > sense to be wildcards (e.g. the last argument to mv) are scanned for
 > wildcard characters, and generate an error if any are seen

   The Unix approach have its advantages sometimes. Assume you have
the files "L19901200.axx772" and "L19910100.bxx19974". If you wish
overwrite the second file with the first it makes sense to use

	mv L*72 L*74

if that's unique enough. Another case is cd. When you have only file,
and it's a directory, "cd *" makes sense. If I have a list of files
and like to see if L199901200.axx772 has been backed up, it makes
sense to use the command

	grep -l L*772 L*backup

to get a list of what backup file lists it appears in. This
orthogonality is quite a strength of Unix. Granted, there's an almost
infinite potential for error, with no recovery.

 > Putting the work in the utility allows useful syntaxes such as:
 >
 >     mv * =.old

   This is something I quite miss. But I'd rather see it solved in the
shell than in the tools, perhaps as variation of the {...} syntax. Say
that {:re} were made a regex quote, then one could write something
like:

	mv {:\(^.+\)$ \1.old}

The name "foo" would thus be replaced by "foo foo.old". At '$' the
shell would go from "match" to "replace." If it didn't match at all,
the shell would just move on to the next file name. I'm confident this
exact issue has been dragged around in the dirt in more than one
newsgroup, so dear netter, please don't regard this as some kind of
shell-extension proposal, only as a highly hypothetical and off-hand
contribution to the where-do-we-glob issue. The point I'd like to make
is that making this modification in a library requires relinking and
possibly recompiling, every tool that uses it when modified, whereas
the shell can be quickly recompiled and replaced, with several
versions available for comparisons.

   I guess every Unix programmer has at some point wished there was a
sh-compatible glob(3) for some very specific purposes, though.

 > If there's a reasonable library routine available, the hardest part
 > should be deciding which arguments should be processed as wildcards.

   It really doesn't make much difference whether it's in a library or
in a shell, really, as long as there are sufficient hooks for
redefining the syntax. I mean, instead of escaping don't-matches, you
end up escaping do-matches to tell the globber you do want a specific
argument globbed regardless of what the tool thinks is appropriate.

   In Unix hooks really aren't necessary, as you can switch shells
easily, or implement a different syntax in your application and be
sure nothing is globbed following exec(2).

 > I agree, it isn't cat's job [to glob].  However, it would be the job
 > of a wildcard_match() library routine, which would take a wildcard
 > argument and return an array of filenames.  It would be cat's job to
 > call this for its filename arguments.

   I'm not sure how feasible this is. In order to handle `command`
constructs you would need to include an entire shell, more or less. Or
you could start a subshell, in which case we're back where this
discussion started. (Should the subshell call wildcard_match()?)

Just some random thoughts,

						-- Jan Brittenson
						   bson@ai.mit.edu

boyd@necisa.ho.necisa.oz.au (Boyd Roberts) (01/15/91)

In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  Explains why I don't use the bundled command shells
>much.

Well, what do you use then?  cat(1)?

How can you comment on stuff you've just admitted you don't use
and have further demonstrated a basic misunderstanding of?

Why would you _not_ want to use the standard command interpreters?
The [Bourne] shell is one of the benefits of using UNIX.


Boyd Roberts			boyd@necisa.ho.necisa.oz.au

``When the going gets wierd, the weird turn pro...''

darcy@druid.uucp (D'Arcy J.M. Cain) (01/15/91)

In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.  Explains why I don't use the bundled command shells
>much.  This is a choice that _should_ be left to the discretion of the
>utility writer.

In your opinion.  I have a different opinion and here's why.  With the
shell expanding the wildcards I know exactly what the effect will be
when I give it "x*y."  If each program expanded its own wildcards (like
MS-DOS) then I would have to test every tool before I could be sure that
it acted as I wanted it to and even then I really couldn't be sure.  When
I meet a new tool I want to fully explore its new capabilities.  I don't
want to keep going over the same ground all the time.

BTW I don't believe the shell is perfect but it certainly does things a
lot better than messy-dos - in *my* opinion.

-- 
D'Arcy J.M. Cain (darcy@druid)     |
D'Arcy Cain Consulting             |   There's no government
West Hill, Ontario, Canada         |   like no government!
+1 416 281 6094                    |

subbarao@phoenix.Princeton.EDU (Kartik Subbarao) (01/15/91)

In article <11390@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>From article <1991Jan14.170115.17178@Think.COM>, by barmar@think.com (Barry Margolin):
>> [...]
>> In the "one tool = one job" approach, I think the shell should handle the
>> job of parsing command lines.  Since part of this philosophy is that the
>> tool should do this one job *well*, I don't think the shell should expand
>> wildcards.  It doesn't know which arguments are filenames, so it shouldn't
>> blindly expand wildcards in all of them.

As was stated earlier, it's a lot much less overhead to have just the
shell do globbing and let the user decide whether he wants parts of the
command line unglobbed than forcing every application to do this.

>I usually don't post to these wild flame-fest discussions (even those
>I start!) except on weekends.  But the above is the first intelligent
>comment about wildcard 'globbing' that has been posted.  I just wanted
>to say "Bravo!"

Yo, Jim. Ever heard of the ' and " characters? They _do_ come in handy when you
don't wan't to glob part or all of a command line. 

			-Kartik
--
internet# ls -alR | grep *.c
subbarao@{phoenix or gauguin}.Princeton.EDU -|Internet
kartik@silvertone.Princeton.EDU (NeXT mail)       -|	
SUBBARAO@PUCC.BITNET			          - Bitnet

barmar@think.com (Barry Margolin) (01/15/91)

[I've followed someone else's lead and directed followups to comp.os.misc.]

In article <BSON.91Jan14170904@rice-chex.ai.mit.edu> bson@ai.mit.edu (Jan Brittenson) writes:
>In article <1991Jan14.170115.17178@Think.COM> 
>   barmar@think.com (Barry Margolin) writes:
> > I've used several systems (Multics, ITS, TOPS-20) where the commands
> > invoked the globbing routine, ... Filename arguments that don't make
> > sense to be wildcards (e.g. the last argument to mv) are scanned for
> > wildcard characters, and generate an error if any are seen
>
>   The Unix approach have its advantages sometimes. Assume you have
>the files "L19901200.axx772" and "L19910100.bxx19974". If you wish
>overwrite the second file with the first it makes sense to use
>
>	mv L*72 L*74
>
>if that's unique enough. Another case is cd. When you have only file,
>and it's a directory, "cd *" makes sense. If I have a list of files
>and like to see if L199901200.axx772 has been backed up, it makes
>sense to use the command

I admit that I make use of wildcards as convenient way to abbreviate
filenames, and when I used Multics I sometimes even wished that its "cd"
equivalent allowed this.  However, I don't think this is a good excuse for
such a poor user interface design.  The "cd" command can warn if the
wildcard expands into multiple filenames, since it only allows one
argument; however, in the "mv" case, such a mistake can result in
completely unexpected behavior with no warning.

In my opinion, an interactive mechanism is a much better basis for an
abbreviation design.  For instance, you could type "L*74" and then hit a
control or function key that would cause the string to be replaced by the
matching filenames (or maybe it would beep if the wildcard matches multiple
files).

>	grep -l L*772 L*backup
>
>to get a list of what backup file lists it appears in. This
>orthogonality is quite a strength of Unix. Granted, there's an almost
>infinite potential for error, with no recovery.

The fact that you can find a use for a weird behavior doesn't mean that
behavior is a good feature.  Which is more useful:

	grep -l L*772 L*backup
or
	grep #define.*EXT *.c *.h

> > Putting the work in the utility allows useful syntaxes such as:
> >
> >     mv * =.old
>
>   This is something I quite miss. But I'd rather see it solved in the
>shell than in the tools, perhaps as variation of the {...} syntax. Say
>that {:re} were made a regex quote, then one could write something
>like:
>
>	mv {:\(^.+\)$ \1.old}

This violates your point that just changing the shell would be enough:
mv(1) doesn't accept multiple old/new filename pairs.  The argument
syntaxes of most Unix commands are designed based on the kinds of
expansions that Unix shells are expected to provide; since the shells are
known not to provide the above expansion, no commands are prepared to
accept such arguments.

>   I guess every Unix programmer has at some point wished there was a
>sh-compatible glob(3) for some very specific purposes, though.

Yes!  Many programs allow filenames to be specified interactively; if they
want to allow these filenames to be wildcards and be sure of
sh-compatibility, they have to fork a subshell, pipe the output of echo,
and parse the result (and this won't work if any of the matching files have
spaces in their names).

Also, one might want to perform wildcard matching on names of other things
besides files.

> > If there's a reasonable library routine available, the hardest part
> > should be deciding which arguments should be processed as wildcards.
>
>   It really doesn't make much difference whether it's in a library or
>in a shell, really, as long as there are sufficient hooks for
>redefining the syntax. I mean, instead of escaping don't-matches, you
>end up escaping do-matches to tell the globber you do want a specific
>argument globbed regardless of what the tool thinks is appropriate.

True.  For instance, in your above grep command, you could do:

	grep -l `ls L*772` L*backup

>   In Unix hooks really aren't necessary, as you can switch shells
>easily, or implement a different syntax in your application and be
>sure nothing is globbed following exec(2).

You can't really switch shells all that easily.  On many Unix systems, you
can only make a shell your default shell if it is in your system's
/etc/shells.

> > I agree, it isn't cat's job [to glob].  However, it would be the job
> > of a wildcard_match() library routine, which would take a wildcard
> > argument and return an array of filenames.  It would be cat's job to
> > call this for its filename arguments.
>
>   I'm not sure how feasible this is. In order to handle `command`
>constructs you would need to include an entire shell, more or less. Or
>you could start a subshell, in which case we're back where this
>discussion started. (Should the subshell call wildcard_match()?)

I don't consider `command` the same as globbing.  It's part of the shell
syntax, as are alias and variable expansion.  The difference between these
and wildcard expansion is that the latter is much more context-dependent,
since it is specific to filename arguments.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

barmar@think.com (Barry Margolin) (01/15/91)

[Note: followups directed to comp.os.misc.]

In article <5371@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes:
>In article <11390@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
[In response to one of my posts about commands controlling wildcard
expansion, rather than the shell blindly expanding all wildcards.]
>>I usually don't post to these wild flame-fest discussions (even those
>>I start!) except on weekends.  But the above is the first intelligent
>>comment about wildcard 'globbing' that has been posted.  I just wanted
>>to say "Bravo!"
>Yo, Jim. Ever heard of the ' and " characters? They _do_ come in handy when you
>don't wan't to glob part or all of a command line. 

Why should the user have to type extra characters to tell the computer not
to do something, when it should know better all by itself.  The computer is
supposed to *save* the user from having to worry about trivial details like
this.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

det@hawkmoon.MN.ORG (Derek E. Terveer) (01/15/91)

jlg@lanl.gov (Jim Giles) writes:

>Ok!  Everyone's right and I'm wrong (about the _single_ issue below).

>Yes, both of the shells that are bundled with versions of UNIX _do_
>automatically trash (that is, 'process') the command line arguments to
>expand wildcards.

Why is this a bad thing?  Everytime i have written a utility in dos, i have had
to manually expand the wildcard characters on the command line.  (Of course, i
rather quickly got a standard public domain library routine to do it for me
because it was so painful)  It seems to me, without expending a great deal of
thought on it, that the majority of applications/utilities would desire
wildcard expansion of the command line.  Are you asserting that the reverse is
true?

>Explains why I don't use the bundled command shells
>much.

What command line shell or interpreter or menu system do you use then?

>This is a choice that _should_ be left to the discretion of the
>utility writer.

Yes, i agree, but i can't think of an easy way of implementing this, can you?
-- 
Derek "Tigger" Terveer	det@hawkmoon.MN.ORG - MNFHA, NCS - UMN Women's Lax, MWD
I am the way and the truth and the light, I know all the answers; don't need
your advice.  -- "I am the way and the truth and the light" -- The Legendary Pink Dots

det@hawkmoon.MN.ORG (Derek E. Terveer) (01/15/91)

root@lingua.cltr.uq.OZ.AU (Hulk Hogan) writes:

>Which shells?  Sounds like a useful feature.

ksh for one and i'm sure that csh has it as well.

>One thing DOS does have in its favour is that the wildcarding routines
>are easily accessed. It is quite simple to expand out filename command
>line arguments at the start of the program, and to do wildcarding at any
>other time during the program. The Unix shells give you great command-line
>wildcards, but no easy way of doing wildcarding within your programs, and
>few programs which allow you to use wildcards.

Er, uhm, in what version of dos?  I have never used any dos greater than 3.3
but as far as i knew when i programmed, i had to either call a wildcarding
routine in a compiler that i purchased (such as microsoft c, or turbo c, etc),
call a public domain wildcard routine, or write one myself.  I chose to use the
public domain routine because it was the most useful (for my needs).
Are you saying that the wildcarding routines that are easily accessed are
integral to ms-dos?
-- 
Derek "Tigger" Terveer	det@hawkmoon.MN.ORG - MNFHA, NCS - UMN Women's Lax, MWD
I am the way and the truth and the light, I know all the answers; don't need
your advice.  -- "I am the way and the truth and the light" -- The Legendary Pink Dots

det@hawkmoon.MN.ORG (Derek E. Terveer) (01/15/91)

andy@Theory.Stanford.EDU (Andy Freeman) writes:

>The major cost of being independent is that the shell doesn't know
>what kind of arguments it is processing.  Thus, it is consistent only
>at the cost of being extremely limited.  When the shell wants to do
>something more than just pass on an argument, it assumes that the
>argument is a file name specification and does the only thing it knows
>how to do, namely expand it.  That's fine if every string is a file
>name, but that's not the case, and there's nothing that the unix
>approach can do about it.
>One could remove the unix-approach limitations by providing some
>mechanism for programs to tell the shell about their arguments.

what about quoting?  For example:

egrep "[a-j]*" *

or

egrep '[a-j]*' *

Isn't that (the user) telling the shell to not expand the first argument?
-- 
Derek "Tigger" Terveer	det@hawkmoon.MN.ORG - MNFHA, NCS - UMN Women's Lax, MWD
I am the way and the truth and the light, I know all the answers; don't need
your advice.  -- "I am the way and the truth and the light" -- The Legendary Pink Dots

richard@aiai.ed.ac.uk (Richard Tobin) (01/16/91)

In article <BSON.91Jan14170904@rice-chex.ai.mit.edu> bson@ai.mit.edu (Jan Brittenson) writes:
>   The Unix approach have its advantages sometimes. Assume you have
>the files "L19901200.axx772" and "L19910100.bxx19974". If you wish
>overwrite the second file with the first it makes sense to use

>	mv L*72 L*74

This sort of thing is a common, but rather dangerous hack.  The
purpose of globbing is specify filenames by a pattern.  If you want to
save time typing long filenames then the right tool is some form of
filename completion.  Their are now both sh- and csh- compatible
shells that provide this.

It would, of course, be nice if the shell could complete things other
than filenames.  (I used to have a file named "everything" in my home
directory :-) Something I might implement sometime is a file
describing (probably rather poorly) the syntax of common unix
commands, which would be read by tcsh (or whatever) to allow
command-sensitive completion.

-- Richard
-- 
Richard Tobin,                       JANET: R.Tobin@uk.ac.ed             
AI Applications Institute,           ARPA:  R.Tobin%uk.ac.ed@nsfnet-relay.ac.uk
Edinburgh University.                UUCP:  ...!ukc!ed.ac.uk!R.Tobin

throop@aurs01.UUCP (Wayne Throop) (01/16/91)

> barmar@think.com (Barry Margolin)
>> pfalstad@phoenix.Princeton.EDU (Paul John Falstad)
>>> jlg@lanl.gov (Jim Giles)
>>> [...] the shells that are bundled with versions of UNIX [...]
>>> trash [...] arguments [.. ie, expand wildcards ..]
>>> [...] This is a choice that _should_ be left to the discretion of the
>>> utility writer.
>> This forces the user to remember, for each individual command, whether
>> or not it processes wildcards.  
> [..But..] There's a
> simple rule: if an argument is a filename, and it makes sense in the syntax
> of the command to refer to multiple files at once, then it is processed for
> wildcards.

Actually, a generalization of this notion is quite attractive.
Suppose, by analogy with C prototypes and coersions, that the commands
the shell knows about have arguments of specific type.  This allows
provision for coersion of things of type string (which the shell
naturally deals in) to things of type (say) filename, or filename list,
or whatnot (which would naturally involve wildcard expansion).

Or, to put it another way, NEITHER the shell NOR the command should be
responsible for wildcard expansion... it ought to be the responsibility
of coersion code associated with a type.  (Given performance constraints
of encapsulating this code in a process, it is likely that the only way
to make this work well is with dynamic linking of some sort.  Also note
that "types" should be coin-able on the fly, just as commands should be.)

Note that such a scheme of "prototypes" and "coersions" supplied
by the notion of shell types can easily solve problems (like a wildcard
intended as the last argument to mv expanding multiply, and thus going
awry, or like expansion-on-the-fly in a GUI or other WIMP-ish interface)
which placing coersions in the shell proper OR in the command proper
cannot address well.

Further, the notion of "flags" can be incorporated as keyword arguments
in such a scheme, leading to a much cleaner and simpler framework
with as much or more expressive power as the current "getopt" notions.

( I am prejudiced in this, of course, since I worked on/with a
  command processor that implemented all these notions and more. )

Wayne Throop       ...!mcnc!aurgate!throop

mike@bria.UUCP (Michael Stefanik) (01/16/91)

In article <1991Jan14.170115.17178@Think.COM> think.com!barmar (Barry Margolin) writes:
>I've used several systems (Multics, ITS, TOPS-20) where the commands
>invoked the globbing routine, and rarely found it confusing.  There's a
>simple rule: if an argument is a filename, and it makes sense in the syntax
>of the command to refer to multiple files at once, then it is processed for
>wildcards.  Non-filename arguments (e.g. the first argument to grep) are
>never processed for wildcards.  Filename arguments that don't make sense to
>be wildcards (e.g. the last argument to mv) are scanned for wildcard
>characters, and generate an error if any are seen (there's generally a
>syntax to override this, to allow you to access files with wildcard
>characters in their names when you *really* need to).
>[...]
>In the "one tool = one job" approach, I think the shell should handle the
>job of parsing command lines.  Since part of this philosophy is that the
>tool should do this one job *well*, I don't think the shell should expand
>wildcards.  It doesn't know which arguments are filenames, so it shouldn't
>blindly expand wildcards in all of them.

What you seem to be suggesting is something along the order of how VMS
handles commands: you create a text file (which describes the command
in various details), "compile" it into a database and tell the shell that
the new command is available, and how to parse that command.  Thus, under
VMS, when you enter a command, the shell breaks it down into components
that are read by the program when it executes.  The alternative is that the
shell does no globbing.  Either approach doesn't sit too well with me
for a few reasons:

1. It complicates code design by adding an extra "level" of complexity
   to the problem being solved.  Now, not only do you (as a programmer)
   have to worry about coding just the solution, but also how the
   shell will interact with it.  If I want to add a feature to a program,
   I can easily throw in a switch option, write the code to go along
   with it, and compile.  Under what you are proposing, I would additionally
   have to tell the shell that the program has been changed.  This is a
   real hassle.  When I was writing code under VMS, I was constantly
   changing this interaction between program and shell, and having to
   recompile the command definition and enter SET COMMAND all the time and
   this was a significant pain.  This functionality may have marginal benefits
   for the user, but it would be a *real turnoff* for UNIX programmers.

2. In my view, it is the job of the shell to parse (and glob) args, not the
   programs that are being given the arguments.  The idea of having a
   routine that would process command arguments would be used in an
   inconsistent fashion (as the DOS world shows us), and would either
   increase the size of every program unreasonably (unreasonable in the
   sense that the program shouldn't *have* to do this), or prevent
   these programs from being statically linked (and remember, not all
   flavors of UNIX support sharable libraries; what happens when you
   have two machines, COFF compatable, but one has the library, the
   other doesn't?  What happens when this function, because it would be
   new, is defined seven different ways by seven different vendors?)
   A standard becomes such because it stands the test of time; just
   throwing some subroutine in libc.a and insist that everyone use it
   just ain't reasonable.

3. If you don't go the command database route, and simply let the program
   do *all* globbing, then what if I want to change the way the command
   processes the arguments *on the fly* (ie: as I am entering the command);
   shall the program then be responsible for quoting, etc?  What if your
   program says "well, they'll *never* use this argument as a file (therefore
   globbing is not done), and I come up with a unique solution that would
   facilitated by globbing?  By putting globbing in the program, and not
   giving me the source to change what you think should be done, you take
   a freedom away from me.  On the otherhand, the shell expands wildcards
   in a predictible fashion (some may argue that it is obscure, but none-
   theless, it is predictable), and I have complete control over how it's
   done (to use quotes, or not to use quotes, that is the question ...)

If you don't like globbing, then turn it off, use quotes, or escape it.
The point being that, if globbing confuses you, you're not *forced* to
use it; forcing programmers to take on that responsibility, IMHO, is
unreasonable at this late date.

BTW, I speak not for end users (for how many users care about the
many variations of globbing, or whatever?)  They just want to select
their application from a menu, and not be bothered.
-- 
Michael Stefanik, Systems Engineer (JOAT), Briareus Corporation
UUCP: ...!uunet!bria!mike
--
technoignorami (tek'no-ig'no-ram`i) a group of individuals that are constantly
found to be saying things like "Well, it works on my DOS machine ..."

barmar@think.com (Barry Margolin) (01/17/91)

In article <1991Jan16.063253.2834681@locus.com> dana@locus.com (Dana H. Myers) writes:
>  Ok. Once you said "All tools should work together in a well
>thought out way". Then you say that applications should be forced to
>somehow tell the shell whether to glob or not. What this means is that
>some applications would glob while others wouldn't. How would one know
>what a given app does for sure? Hmmm..

I'm getting tired of repeatedly seeing this argument.  How do you know that
*any* program follows common conventions?  For instance, how would one know
that a given app reads stdin and writes stdout, so that it can be used as a
filter?  You know because someone once proclaimed this convention, and
everyone simply follows it (except when they don't, such as in the "passwd"
command, or curses applications, etc.).

>  Also, how would you modify the way the shell works in order to
>allow applications to control whether globbing is done on args? Please
>keep in mind the way the shell works; it fork()s (that is, Jim, makes
>another process which is a running copy of the shell) and then exec()s
>the application (that is, replaces the running copy of the shell with
>the application). At this point, the arguments have been processed
>and the shell no longer has any control.

There are lots of ways to do this.  The simplest way is for the shell not
to glob, but for the application to call a library routine that does it.
If you want the shell to glob before it invokes the application, it could
maintain a database specifying the syntax of each command.  Or it could
open the executable and read the argument syntax specification from the
header before globbing.

None of these changes are feasible in current Unix, but they are ideas for
future OSes.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

andy@Theory.Stanford.EDU (Andy Freeman) (01/18/91)

In article <360@bria> mike@bria.UUCP (Michael Stefanik) writes:
>   with it, and compile.  Under what you are proposing, I would additionally
>   have to tell the shell that the program has been changed.  This is a
>   real hassle.

So is updating documentation, but it is worthwhile.

Besides, there's no particular reason that one can't include a
description of the argv arguments to the executable.  Then, the shell
can merely look at the program it is about to invoke to decide what it
(the shell) should do with the typed arguments before invoking the
program.

>2. In my view, it is the job of the shell to parse (and glob) args, not the
>   programs that are being given the arguments.

I'd love to have a shell that did something reasonable with arguments.
Instead, all I can have is a shell that assumes that all arguments
are filenames and expands them as filenames.

The issue isn't whether or not I can pass "*" as an argument to a
program, it is what to do about arguments that aren't filenames.
Shells treat each and every argument as a file name.  If an argument
isn't a filename, there's no way to have the shell expand it
appropriately.

BTW - There are things one would like to see a command processor do
beyond expand arguments.  For example, it could tell you what kind of
argument is expected, possibly including a list of options.  It might
even tell you what it is going to do with that argument.  (For
example, the command parser could tell you that mv's first argument is
a source, if you then asked what the second argument was, it could say
"destination or another source, in which case the last argument must
be a directory".)

-andy
--
UUCP:    {arpa gateways, sun, decwrl, uunet, rutgers}!neon.stanford.edu!andy
ARPA:    andy@neon.stanford.edu
BELLNET: (415) 723-3088

gsarff@meph.UUCP (Gary Sarff) (01/18/91)

In article <1991Jan14.203207.20436@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <11314@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
>>Yes, both of the shells that are bundled with versions of UNIX _do_
>>automatically trash (that is, 'process') the command line arguments to
>>expand wildcards.  Explains why I don't use the bundled command shells
>>much.  This is a choice that _should_ be left to the discretion of the
>>utility writer.
>
>Wrong.  Then -- as seen repeatedly in other operating systems -- every
>command does it differently, and a lot of them don't do it at all.
>-- 

I do not see why you are taking this as a given.  I am an OS development
programmer at WICAT (actually the only one left.)  Our OS called WMCS which
has flavours of both Unix and VMS does not have this problem.  There are
library routines to parse the command line.  The progammer declares a
structure in his program telling which arguments are required, which are
optional, which are positional, etc, along with default values (if any) if an
argument was not specified.  The command line routine cmdline() is then
called.  This routine will prompt the user for any required arguments that
the user did not specify, and then returns a nicely formatted array of
structures with information about the command line to the program.  File
wildcarding is another library routine, and includes the traditional *,?
multiple wildcarded specifications separated by commas, (such as x*y,a*z)
and also includes such things as, :since=<date> :before=<date>,
:filesize=<range>, :owner=<list of owner names or numbers>,
:filetype=<list-of-filetypes>, :exclude=<another file list> and several
others.  _ALL_! utilities that we provide that will operate on a file take a
FILE LIST as an argument and perform the wildcarding.  _ALL_ utilities use
the command line parsing routine.  Why is this so?  Because it seemed
sensible at the time for us to do it that way?  Just because somebody
sometime did an OS that had inconsistent commands does not condemn the
practice of having the programs handle command line processing instead of the
shell.  I would think it would say something about the people that did the
shoddy work instead.

---------------------------------------------------------------------------
                          I _don't_ live for the Leap!
     ..uplherc!wicat!sarek!gsarff

avg@hq.demos.su (Vadim Antonov) (01/19/91)

Just for your information: the older shells (like vanilla V6 shell)
did not include globbing! The actual globbing was performed with the
separate command "glob" - so one could replace this command and get
a new globbing (you can find a rudimentary "glob" in your csh :-).
Interesting enough that actual if-s and goto-s were also implamented as
separate commands. They simply inherit an open command file descriptor
and move it into required location. The V6 shell itself was a rather
trivial program - about 1K lines. Later the commands goto, if, loops and
(even later :-) test + echo were moved inside shell for efficiency reasons.
chdir never was a separate program :-).

V6 was the true Unix, sigh. My V6-based system (V7-compatible exept for
file system and some ioctls) was able to run up to dozen of editor-compiler
users simultaneously on PDP-11/34 with 248K *core* RAM. The kernel
was about 64K (to compare with 1068K of XENIX kernel in my 386 box).
After all I cannot say it's much more functional, just a bit faster.

Vadim Antonov
DEMOS, Moscow, USSR

tbray@watsol.waterloo.edu (Tim Bray) (01/20/91)

johnl@iecc.cambridge.ma.us (John R. Levine) writes:
 
 Having used both systems that do globbing in
 the shell (various forms of Unix) and systems that do globbing in the
 program (TOPS-10, Twenex and MS-DOS, among others, anyone else ever use
 DOS-11?)
 
I've done a lot of this in VMS, a long time ago.  VMS command line processing
is done through a combination of calling a command line parser (DCL$PARSE or
some such) and a bunch of RMS calls which do VMS-style file globbing.  The
ONE advantage this buys you is that you can do automatic globbing to support
things like

COPY *.FOO *.BAR

There are many disadvantages, but the two big ones are:

1. Coding up all this stuff is tedious (gag me with a XAB$_FAB)
time-consuming, and easy to get slightly wrong.
2. Partly because it's under program control, and partly because of #1,
it's REAL easy to get inconsistent behaviour.  This extends to things
such as the vms /OUTPUT=FOO vs. unix "> foo".

The unix model is a win.

Cheers, Tim Bray, Open Text Systems, Waterloo, Ontario

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (01/22/91)

In article <1991Jan21.145936.7076@phri.nyu.edu> roy@phri.nyu.edu (Roy Smith) writes:

| 	Given the move towards kernel bloat, I fear that one alternative we
| might see some day is moving file name globbing into the kernel.  "Let's
| let namei do it; namei does everything!"  Blech.

  Careful, this one might make some sense. Certainly the number of
systems calls and stuff it takes to do globbing, and the problems with
various distributed filesystems would indicate room for improvement by
doing it that way.

  I am *not* suggesting that this be done, but it would make a great
procedure to have in shared libraries rather than in every shell.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
  "I'll come home in one of two ways, the big parade or in a body bag.
   I prefer the former but I'll take the latter" -Sgt Marco Rodrigez