[comp.arch] Globbing

don@dgbt.doc.ca (Donald McLachlan) (02/18/91)

I'll probably burn in hell for this one, but why not ...

1 remove globbing from the shell.
2 put in a library.
3 current shell programs would need to be updated (first thing they do
  is call glob(argc, argv), returning argc and argv updated (globbed)
4 New programs that want to glob can call glob.
5 New the rename command (rename *.pas *.p) could ...

	look at argc and argv, and copy them somewhere.
	modify argv and argc to just glob for files with .pas
	and then build a cmd line and exec mv.


Fire away...

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/19/91)

In article <1991Feb18.152347.28521@dgbt.doc.ca> don@dgbt.doc.ca (Donald McLachlan) writes:
> 1 remove globbing from the shell.
> 2 put in a library.
  [ etc. ]

Pointless and destructive. You can do this much more easily by modifying
your shell, and it helps neither efficiency nor security to have every
program do its own globbing.

---Dan

don@dgbt.doc.ca (Donald McLachlan) (02/22/91)

>In an article, dgbt.doc.ca!don (Donald McLachlan) writes:
>|1 remove globbing from the shell.
>|2 put in a library.
>|3 current shell programs would need to be updated (first thing they do
>|  is call glob(argc, argv), returning argc and argv updated (globbed)
>|4 New programs that want to glob can call glob.
>
>This has been discussed before, but briefly put, it would complicate code
>needlessly, and create even more of a maze of command syntax for the
>average user.  The one Good Thing about the shell globbing is that it is
>consistent; putting globbing in the tool itself would create an endless
>opportunity for inconsistency and confusion.  I would vote ``no'' on this one.
>
>Cheers,
>-- 
>Michael Stefanik, MGI Inc., Los Angeles| Opinions stated are not even my own.
>Title of the week: Systems Engineer    | UUCP: ...!uunet!bria!mike
>-------------------------------------------------------------------------------
>Remember folks: If you can't flame MS-DOS, then what _can_ you flame?
>

Sorry, a typo of mine may have made my idea sound less consistant than I
intended. What I wanted 2) to say was to put globbing into a standard
library, so that aside from the rare cases like "rename *.pas *.p" all
globbing would still be "psuedo standard".

Now I mentioned this not because I think doing this is the ultimate answer,
but it would allow someone to implement their favorite rename command. That
is what prompted me to post originally.

Don McL

daveh@cbmvax.commodore.com (Dave Haynie) (02/22/91)

In article <474@bria> uunet!bria!mike writes:
>In an article, dgbt.doc.ca!don (Donald McLachlan) writes:
>|1 remove globbing from the shell.
>|2 put in a library.
>|3 current shell programs would need to be updated (first thing they do
>|  is call glob(argc, argv), returning argc and argv updated (globbed)
>|4 New programs that want to glob can call glob.

>This has been discussed before, but briefly put, it would complicate code
>needlessly, 

Well, it certainly can complicate code a little.  You might have to build
a doubly nested loop, rather then the single "for (int i = 0; argc--;..." 
thing.

>and create even more of a maze of command syntax for the average user.  

Unless all possible commands fit into the 

	command [flags] arg1..argN globbed_filesystem_arg

model, you're pretty much in trouble if you only have shell globbing.  And
anything that doesn't fit that model becomes needless, inconsistent 
complication to the average user.  After all, the average user doesn't know
how the program gets written, and has no reason to expect that shell 
escapes are necessary when feeding a command a pattern for a string search,
but not necessary when feeding a command a pattern for a file search.  

Program driven globbing doesn't force inconsistency, and certainly shell
globbing doesn't force consistency, as UNIX is more than happy to prove to
anyone using it.

>Michael Stefanik, MGI Inc., Los Angeles| Opinions stated are not even my own.

-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"What works for me might work for you"	-Jimmy Buffett

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/23/91)

In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> Unless all possible commands fit into the 
> 	command [flags] arg1..argN globbed_filesystem_arg
> model, you're pretty much in trouble if you only have shell globbing.

Why? You didn't provide any justification for this statement.

> Program driven globbing doesn't force inconsistency, and certainly shell
> globbing doesn't force consistency, as UNIX is more than happy to prove to
> anyone using it.

Why? You didn't provide any justification for this statement.

Name one thing that you could accomplish by moving globbing into
programs---that you couldn't accomplish at least as easily by modifying
the shell. After all, you're complaining about the user interface, and
the shell is the program responsible for that interface.

Here are some disadvantages: 1. Programs (such as shell scripts) often
invoke other programs, even with (gasp) arguments. As is, it suffices to
use an occasional -- to turn off all argument processing. With globbing
in every program, this would become much harder. 2. Many perfectly good
applications work without globbing, and we shouldn't rewrite them for no
obvious benefit. 3. Programmers shouldn't be forced to manually handle
standard conventions just to write a conventional program. Ever heard of
modularity? 4. The system is slow enough as is without every application
scanning its arguments multiple times and opening up one directory after
another.

---Dan

mike (02/24/91)

I'll play the Devil's Advocate for awhile here ...

In an article, kramden.acf.nyu.edu!brnstnd (Dan Bernstein) writes:
>Here are some disadvantages: 1. Programs (such as shell scripts) often
>invoke other programs, even with (gasp) arguments. As is, it suffices to
>use an occasional -- to turn off all argument processing. With globbing
>in every program, this would become much harder.  [...]

Whoa!  The ``--'' encourages getopt() to prematurely return EOF; it does
not, as you have seemingly implied, stiffle globbing by the shell. In any
case, how do you justify your statement that ``turning off'' globbing
would be more difficult from within a program?  Remember, the assumption is
that there is a standard function available that does the argument
processing, and we should further assume that it would accept a notation
to tell it to stop processing the argument list at that point.  Yes, I
am aware of the pitfalls of ``assume'', but this is all hypothetical
anyway. :-)

>2. Many perfectly good applications work without globbing, and we shouldn't
>rewrite them for no obvious benefit. [...]

Programs that don't need argument processing wouldn't call the function.
I'm not quite understanding what it is that you are trying to say here.

>3. Programmers shouldn't be forced to manually handle
>standard conventions just to write a conventional program. Ever heard of
>modularity? [...]

Modularity would not be sacraficed by using a standard function.

4. The system is slow enough as is without every application
>scanning its arguments multiple times and opening up one directory after
>another.

What is the difference between the shell processing the arguments, opening
directories, searching for files, etc. and the program doing the same?
I don't see a performance win here.

Personally speaking ...

I don't like the idea of putting the responsibility for globbing in the
program because it would create an inconsistency.  Right now, I know if
I use the wildcard characters unquoted on the command line, the shell will
attempt to expand them.  This is consistent.  If the program does the
globbing, then we will have to remember which commands glob which arguments.
This could migrate from a minor annoyance to a major headache (rather
quickly, I would think).  To overcome this, some sort of ``standard'' way
of processing the argument vector would have to arise, therefore offering
nothing that the shell does not already provide.  Secondly, I would see
no significant increase in ``ease of use'' or increase in performance.
It would not be a win for the neophyte (because different commands may choose
to process wildcards differently), and there wouldn't be a win for the
veteran (it offers to pointlessly alter something that we already have the
capability to do).

Note also that by quoting the wildcards, your intentions are obvious to
another prerson (ie: the shell is not to expand this when the command
is being evaluated); by removing this ``demarcation'', IMHO, it would make
some scripts more difficult to read and debug because the rules of
globbing would change from command to command, quite possibly without
rhyme nor reason.  (And before you say that it couldn't happen here,
think again ... it most certainly could.)

Cheers,
-- 
Michael Stefanik, MGI Inc., Los Angeles| Opinions stated are not even my own.
Title of the week: Systems Engineer    | UUCP: ...!uunet!bria!mike
-------------------------------------------------------------------------------
Remember folks: If you can't flame MS-DOS, then what _can_ you flame?

peter@ficc.ferranti.com (Peter da Silva) (02/27/91)

In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> Unless all possible commands fit into the 

> 	command [flags] arg1..argN globbed_filesystem_arg

> model, you're pretty much in trouble if you only have shell globbing.

Actually, the model is less restrictive than that. It's more like:

	command [flags] args-possibly-globbed [flags] more-globbed-args ...
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

daveh@cbmvax.commodore.com (Dave Haynie) (02/27/91)

In article <5573:Feb2307:19:4491@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>> Unless all possible commands fit into the 
>> 	command [flags] arg1..argN globbed_filesystem_arg
>> model, you're pretty much in trouble if you only have shell globbing.

>Why? You didn't provide any justification for this statement.

Because it's obvious.  If I have, for example, two sets of globbed filesystem
arguments, the program can't determine which of the two sets an arbitrary
expanded file name belong to.  If the program does the globbing, dealing
with a command of the form "foo A* B*" is trivial.  Shell globbing won't
allow it.

>> Program driven globbing doesn't force inconsistency, and certainly shell
>> globbing doesn't force consistency, as UNIX is more than happy to prove to
>> anyone using it.

>Why? You didn't provide any justification for this statement.

>Name one thing that you could accomplish by moving globbing into
>programs---that you couldn't accomplish at least as easily by modifying
>the shell. 

Well, with an arbitrary amount of modification, the shell might do practically
anything, including passing the program a totally preparsed set of argument
tags and possibly globbed filesystem space lists as well.  And if the shell
knows the exact syntax of the command, unnatural quoting won't be necessary
either.  However, this changes the shell mechanism completely.  And requiring
separate command description files to go along with the command itself is 
annoying.  At least, it is under VMS, the only system I have used that has
this "feature".

>Here are some disadvantages: 1. Programs (such as shell scripts) often
>invoke other programs, even with (gasp) arguments. 

Sure they do.  Works just great under AmigaDOS, where programs glob.  Never
had a problem, never had a need to turn off globbing, since the program always
knows when it's appropriate.  What's the problem here?

>2. Many perfectly good applications work without globbing, and we shouldn't 
>rewrite them for no obvious benefit. 

Well, perhaps not, though you just told me that I need to extensively rewrite
the user interface and program command line interface so that shell globbing
will solve my problem, rather than resort to an extra for loop and a couple
of function calls in my program.  That's about five minutes work to add to 
any program, by the way, if you're not in a hurry.

>3. Programmers shouldn't be forced to manually handle standard conventions 
>just to write a conventional program. Ever heard of modularity? 

Of course.  And globbing should be done "the standard way", regardless of
where it takes place.  On a command line, in a "dialog box", on an program's
command line, etc.  My way, the globbing mechanism is the same, the user 
interface is the same, and the programmer's mechanism is the same, no matter
what.  And if you don't like the wildcard set, you can change them in one
place, and every command line, dialog box, grep-style program, etc. gets
the change.  And you don't need to tell the user that a different set of
characters must now be quoted.  How does shell globbing help me in a dialog
box?  Or, for a simpler problem: I have a program for the Amiga called
DiskSalv.  It scans over a damage disk and extracts recoverable file.  If
you don't want them all, you can specify a pattern to search for:

	DiskSalv dh0: ram: FILE #?.c

This would recover every C file from the hard disk DH0: and stick it into the
RAMdisk.  Shell globbing is of no use here, since the filesystem can't even
get at the files in question (though in another program, they may not be
files at all).  Unless the shell knew the program's syntax, the #?.c would
have to be quoted, which is a needless complication, especially for the 
non-expert.  Since the expansion is done within the program, using the same
mechanism as one would normally use in normal file globbing (the function
matches over a standard list rather than a standard filesystem path, but
the syntax is the same in both cases).

>4. The system is slow enough as is without every application scanning its 
>arguments multiple times and opening up one directory after another.

It's no slower for the program to do it than for the shell to do it.  And
it conserves resources and/or saves frustration, since the program needs only
to handle one expansion at a time, rather than have the shell manage a dynamic
buffer for the occasional command line that expands to 10K or 100K worth of
file names, or worse, run out of space in a fixed sized buffer.

>---Dan


-- 
Dave Haynie Commodore-Amiga (Amiga 3000) "The Crew That Never Rests"
   {uunet|pyramid|rutgers}!cbmvax!daveh      PLINK: hazy     BIX: hazy
	"What works for me might work for you"	-Jimmy Buffett

peter@ficc.ferranti.com (Peter da Silva) (02/28/91)

Hey, would you please direct your followups out of comp.arch (like I'm doing
here)?

In article <19336@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> Because it's obvious.  If I have, for example, two sets of globbed filesystem
> arguments, the program can't determine which of the two sets an arbitrary
> expanded file name belong to.

foo -a this that *.c -b this that *.o -c this that *.1

> If the program does the globbing, dealing
> with a command of the form "foo A* B*" is trivial.  Shell globbing won't
> allow it.

Proof by existence, "find".

> >Here are some disadvantages: 1. Programs (such as shell scripts) often
> >invoke other programs, even with (gasp) arguments. 

> Sure they do.  Works just great under AmigaDOS, where programs glob.

No, it doesn't. The usual AmigaDOS subshell environment might be so
screwed up that you didn't notice, but the magic I have on occasion had
to do to get the right arguments to the right programs in Browser (yes,
it's my own program... but it's proven moderately popular even for
people using 2.0 (which surprised me... I use the 2.0 workbench myself))
is sufficiently painful that "works great" is not an adequate description.

It works, but you pretty much have to be prepared to reverse-engineer
quoting and hope the program you call doesn't do something weird. Then you
get to the problem that command line options are indistinguishable from
filenames...

> And if you don't like the wildcard set, you can change them in one
> place, and every command line, dialog box, grep-style program, etc. gets
> the change.

Yes, shared libraries are great. And having a globbing library is great.
But that's another point.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (02/28/91)

In article <19336@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> In article <5573:Feb2307:19:4491@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> >> Unless all possible commands fit into the 
> >> 	command [flags] arg1..argN globbed_filesystem_arg
> >> model, you're pretty much in trouble if you only have shell globbing.
> >Why? You didn't provide any justification for this statement.
> Because it's obvious.  If I have, for example, two sets of globbed filesystem
> arguments, the program can't determine which of the two sets an arbitrary
> expanded file name belong to.

Reality check: You can't convince anyone that globbing causes
``trouble'' just because it doesn't have your pet feature. You have to
explain *why* that feature is useful. I don't think it's trouble that
``the program can't determine which of two sets an arbitrary expanded
file name belong to''; I also don't think it's trouble that globbing
doesn't solve world hunger.

> If the program does the globbing, dealing
> with a command of the form "foo A* B*" is trivial.

It is trivial as is. Why do you want this separation into two sets? What
conceivable function of foo would require A* and B* to be treated
separately? Are A* and B* supposed to be ``corresponding'' sets in some
way? They will usually not have the same size.

Even if you do have some reason for such a foo, you will always survive
with foo [options] -- A* -- B*. -- cuts off all option processing and
will make a perfectly good separator.

Let me guess: You do mean A* and B* to be corresponding sets. In fact,
you're thinking of rename A* B*. Reality check: This is NOT globbing.
You don't want B* to glob; you want the *'s to be replaced by the *'s in
corresponding A's. Right? If I'm right, then you're not talking about
moving globbing from the shell to programs; you're talking about
extending the entire globbing mechanism. This is a worthwhile project,
and I encourage you to write a shell that supports such globbing if you
can figure out a good syntax.

> >Name one thing that you could accomplish by moving globbing into
> >programs---that you couldn't accomplish at least as easily by modifying
> >the shell. 
  [ ... ]
> However, this changes the shell mechanism completely.

Yes, changing the shell does mean changing the shell. Surely you agree
that this is easier than modifying every single program.

> And requiring
> separate command description files to go along with the command itself is 
> annoying.

Agreed. The syntax should be stored along with the file. There are also
advantages to storing source code and object code in the same file. But
what does option processing have to do with globbing?

> >Here are some disadvantages: 1. Programs (such as shell scripts) often
> >invoke other programs, even with (gasp) arguments. 
> Sure they do.  Works just great under AmigaDOS, where programs glob.  Never
> had a problem, never had a need to turn off globbing, since the program always
> knows when it's appropriate.

Fair enough, but existing UNIX tools have no concept of when they
``should'' glob. If you add globbing then you will break many previously
correct (secure) programs. How do you propose I should invoke ``rm''
upon a file list as root? Do I quote all the arguments? Or does rm
magically ``know'' that it shouldn't glob?

Do you realize how much time it takes to quote arguments? A program
might spend several seconds constructing a million-byte argument list.
It is insanely inefficient to have to quote that list again and again
for each program that sees the arguments.

> >2. Many perfectly good applications work without globbing, and we shouldn't 
> >rewrite them for no obvious benefit. 
> Well, perhaps not, though you just told me that I need to extensively rewrite
> the user interface and program command line interface so that shell globbing
> will solve my problem, rather than resort to an extra for loop and a couple
> of function calls in my program.

No, I didn't say you had to extensively rewrite the shell. In fact, it
takes just a ``for loop and a couple of function calls'' to handle
rename in the shell. If you want rename to work better, figure out a
better syntax, and then change your shell to handle that syntax. Don't
rewrite programs that shouldn't be involved.

> >3. Programmers shouldn't be forced to manually handle standard conventions 
> >just to write a conventional program. Ever heard of modularity? 
> Of course.  And globbing should be done "the standard way", regardless of
> where it takes place.
  [ various comments on what it means to have consistent globbing ]

Good. I'm glad you understand the advantages of shell globbing. Do you
realize how much work it takes to do the same thing in every program?
Just one mistake and boom, you have a program that doesn't glob. UNIX
isn't meant to load all these requirements on programmers.

> How does shell globbing help me in a dialog
> box?

Your shell handles the dialog boxes. Nah, too easy.

> Or, for a simpler problem: I have a program for the Amiga called
> DiskSalv.  It scans over a damage disk and extracts recoverable file.  If
> you don't want them all, you can specify a pattern to search for:

I understand what you're saying here. You're not talking about globbing.
You're talking about a pattern-matching problem with superficial
resemblances to globbing.

Globbing replaces a pattern with a list of all filenames matching that
pattern. It doesn't do any more or less than this.

I agree, it is useful to have pattern-matching available for programs
that need it. So?

> >4. The system is slow enough as is without every application scanning its 
> >arguments multiple times and opening up one directory after another.
> It's no slower for the program to do it than for the shell to do it.

Look, either you have every program looking for arguments to glob, or
you leave globbing in the shell. The first alternative is necessarily
less efficient whenever you pass arguments to more than one program.

---Dan

fetter@cos.com (Bob Fetter) (02/28/91)

  This "shell vs cmd" globbing discussion is getting boring.  I mean,
it's been established that Unix is 'wired' to shell globbing.  And,
there are several environments (Multics, VMS, AmigaDos, Stratus VOS,
etc etc etc...) which handle globbing in the executable.

  It's too late for Unix to consider executable globbing.  Inertia of
existing code seems to nail this down.

  It doesn't seem to be reasonable in today's environment to expect a
"new" operating environment to put/mandate globbing into executables:
there is just too much Unix software out there that will break if/when
it is "ported" to anything new.

  Me, I think it is appropriate to have executables perform globbing.
It eliminates the bull$hit of quoting and keeps semantics from being
drowned in syntax.

  Still, isn't there a reasonable ground for discussion here:

	What if we (the software community) were to start to provide
	software that would **recognize and handle** globbing inside
	executables -->if it is made visible to the executable<--

  Ok, so what is this?  Well, if the shell does globbing, ok, fine.
If somebody decides to code

	execl("my_copy","my_copy","*.c","dest_dir/",NULL);

then why not have 'my_copy' understand globbing?

  The "noise" regarding having to put special logic in each executable
has been beaten to death in this newsgroup.  This problem has been
solved, and it was solved 2 decades ago.  But, I digress.

  Were software to be written in this manner, wouldn't this make the
entire debate happening here moot?  Those folks who advocate having
executables handle globbing are free (like the folks who wrote find)
to put it in.

  Unix != consistency.  Just like I've learned POSIX != portability.
But, then, most *real* progress seems to have sprung from anarchy.
So, this may not really be A Bad Thing(tm).

  Just my $.02.

  -Bob-

peter@ficc.ferranti.com (Peter da Silva) (03/01/91)

In article <43994@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>   Ok, so what is this?  Well, if the shell does globbing, ok, fine.
> If somebody decides to code

> 	execl("my_copy","my_copy","*.c","dest_dir/",NULL);

> then why not have 'my_copy' understand globbing?

What happens when "*.c" is the actual file name under consideration?

The biggest advantage to shell globbing for me is that I *know* that
each argument I pass in argv is damn well going to stay one argument
once it gets to the program I'm calling.

>   Were software to be written in this manner, wouldn't this make the
> entire debate happening here moot?  Those folks who advocate having
> executables handle globbing are free (like the folks who wrote find)
> to put it in.

And the folks who expect programs to take arguments as they're handed,
damnit, will lose out. As we do on every operating system other than
UNIX. I've written this sort of code for MS-DOS, VAX/VMS, and AmigaDOS,
and I really really hate having to special-case all the quoting. Oh sure,
it's easy enough to get right in a script, but when my program is going
to be handed an arbitrary program name, and a list of file names...

Nope. Making programs glob command line arguments is like having them
handle erase and kill processing, or serial port interrupts, or display
refresh, or expose events.

Come on, UNIX started a revolution by making it easier for application
writers to get these sort of details right. Let's not make huge steps
back into the days when everyone did it themselves and most of them got
it wrong!
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

cks@hawkwind.utcs.toronto.edu (Chris Siebenmann) (03/01/91)

fetter@cos.UUCP (Bob Fetter) writes:
[...]
|   Me, I think it is appropriate to have executables perform globbing.
| It eliminates the bull$hit of quoting and keeps semantics from being
| drowned in syntax.

 Actually, all the quoting bullshit is still there, just in a more
rarely needed context where its very infrequency increases the danger
of someone forgetting about the need for it. This is because you still
need to be able to do have programs (like rm and mv, or their
equivalents) work on files with globbing characters in their names. You
also have to stick this quoting logic into anything that spits out
filenames that are going to be handed to other programs; consider "find
/ <conditions> -exec operate {} \;" or its equivalent.

 Becuase it's hidden and rare, this is a much worse beartrap than
ordinary quoting in the shell; quoting in the shell happens once, in
one place, and Unix application writers know that if they don't use the
shell, they're safe. Moving it into Unix-like applications leaves the
application writer and invocer with another worry, and some very hard
decisions.  For example: in such a system, should find's -print option
quote the resulted output appropriately, and should xargs quote the
input arguments? Better make sure both agree.

 My general observation is that quoting is a hard problem, and that the
fewer things you have to quote and in the fewer places, the better. For
all its worts, the Unix shell (and especially rc, the Plan 9/V10 shell)
have simple globbing quoting that you only have to do rarely.

--
"Emacs itself was one of about half-a-dozen dispatch-vector-driven
 editors developed circa 1971-1972, and is known to the world at large
 primarily because it absorbed the functionality of all the others
 before one of them could successfully absorb it.  Emacs has been much
 like an amoeba from the very beginning."	- Lum Johnson
cks@hawkwind.utcs.toronto.edu	           ...!{utgpu,utzoo,watmath}!utgpu!cks

jesup@cbmvax.commodore.com (Randell Jesup) (03/01/91)

In article <17704:Feb2719:04:3791@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>> If the program does the globbing, dealing
>> with a command of the form "foo A* B*" is trivial.
>
>It is trivial as is. Why do you want this separation into two sets? What
>conceivable function of foo would require A* and B* to be treated
>separately? Are A* and B* supposed to be ``corresponding'' sets in some
>way? They will usually not have the same size.

	Because some commands may not wish that all of the parameters
be globbed.  For example, the infamous "rename *.foo *.foo.old" (rename==mv).
Yes, the user can work around it with escape characters, but this can be
a royal pain, and requires signifigant understanding of exactly how the
shell does things (which is why many things, like "find", and some of the
trickier uses of the CShell are absolutely opaque even to people who
use Unix every day, but aren't gurus).

>Even if you do have some reason for such a foo, you will always survive
>with foo [options] -- A* -- B*. -- cuts off all option processing and
>will make a perfectly good separator.

	This imposes a very specific syntax on commands.  What do you do
in a keyword-oriented system?  Also, allowing options anywhere is far
more regular and less confusing to users.

>and I encourage you to write a shell that supports such globbing if you
>can figure out a good syntax.

	Poof, done: VOS on Stratus (if I remember right), and AmigaDos
(and probably others).

>Yes, changing the shell does mean changing the shell. Surely you agree
>that this is easier than modifying every single program.

	I thought we were discussing software architecture here, not Unix
(tm) shells.  For that, we should be in comp.unix.shells, or whatever, not
comp.arch.  Just because it is unlikely to become part of standard Unix 
doesn't mean it's not a better way, or interesting.

>Do you realize how much time it takes to quote arguments? A program
>might spend several seconds constructing a million-byte argument list.
>It is insanely inefficient to have to quote that list again and again
>for each program that sees the arguments.

	You must have missed some important reference, this makes no
sense to me.

>Good. I'm glad you understand the advantages of shell globbing. Do you
>realize how much work it takes to do the same thing in every program?
>Just one mistake and boom, you have a program that doesn't glob. UNIX
>isn't meant to load all these requirements on programmers.

	That's a really silly and spurious argument, IMHO.  Turn that 
argument against argument parsing and you would say that the shell should
do that also, because if someone didn't use (for example) getopt() then
you'd have a program that didn't accept "normal" options, etc. If globbing is
a system function, it's really trivial to use (easier than getopt()).

>> How does shell globbing help me in a dialog
>> box?
>
>Your shell handles the dialog boxes. Nah, too easy.

	Nah, too confusing/hard/expensive/etc/etc.  Shells are good at
interpreting commands and running programs, which has nothing to do with
a good file requester.  If all you're using it for is as a globber, why not
put globbing into the system and have things that want to glob call that?
Using a shell just for globbing is like going to get the mail from the
mailbox (50 feet away) in a ferrari.

>> Or, for a simpler problem: I have a program for the Amiga called
>> DiskSalv.  It scans over a damage disk and extracts recoverable file.  If
>> you don't want them all, you can specify a pattern to search for:
>
>I understand what you're saying here. You're not talking about globbing.
>You're talking about a pattern-matching problem with superficial
>resemblances to globbing.

	He's saying that with shell globbing, anything that needs pattern-
matching is going to require extensive quoting to keep the shell out of 
the hair of the program, and user will have to do it.

>> >4. The system is slow enough as is without every application scanning its 
>> >arguments multiple times and opening up one directory after another.
>> It's no slower for the program to do it than for the shell to do it.
>
>Look, either you have every program looking for arguments to glob, or
>you leave globbing in the shell. The first alternative is necessarily
>less efficient whenever you pass arguments to more than one program.

	Tell me how I invoke multiple programs at once with the same
argument list.  The only way I can see is to stuff the globbed names into
a variable and then use that - which can be done regardless of whether
the shell globs all command-lines.

	BTW, I have written shells with globbing and without, and ones with
command-dependant globbing.  I find the best mix is programs do the globbing
through the system, and the shell does globbing only on demand (an expand
command and an interactive expansion ala tcsh).

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

byron@archone.tamu.edu (Byron Rakitzis) (03/02/91)

In article <1991Feb28.225426.24072@jarvis.csri.toronto.edu> cks@hawkwind.utcs.toronto.edu (Chris Siebenmann) writ
es:
>
> My general observation is that quoting is a hard problem, and that the
>fewer things you have to quote and in the fewer places, the better. For
>all its worts, the Unix shell (and especially rc, the Plan 9/V10 shell)
>have simple globbing quoting that you only have to do rarely.
>

I would just like to "remind" the net that I am presently putting the
finishing touches on my own implementation of "rc", the att plan 9 shell.

You can fetch a copy from archone.tamu.edu, by anonymous ftp, in pub/rc.
rc is about to be dignified with a version number, 0.9, but it needs users
to shake the bugs out. When I am sure it is relatively stable, I will release
version 1.0. (Lest I scare anyone off, I should mention that rc has been my
login shell for the last two months with nary a core dump.)

By the way, quoting in rc is accomplished as follows: surround the word
to be quoted in single quotes. That's it. There is no backslash quoting
and there is no double-quote quoting. It *does* make life a lot simpler.

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/02/91)

> From: cks@hawkwind.utcs.toronto.edu (Chris Siebenmann)
>
>   Actually, all the quoting bullshit is still there, just in a more
>  rarely needed context where its very infrequency increases the danger
>  of someone forgetting about the need for it. This is because you still
>  need to be able to do have programs (like rm and mv, or their
>  equivalents) work on files with globbing characters in their names. You
>  also have to stick this quoting logic into anything that spits out
>  filenames that are going to be handed to other programs; consider "find
>  / <conditions> -exec operate {} \;" or its equivalent.

  Good points.  Note, though, that this really applies to _all_ "special"
characters, whether they are special to the shell or to the program.
Consider, for instance, files named "-" and "--".  Or '"'.  Pathological
examples abound :-).

  Perhaps what is needed is a way to tell a program unambiguously about its
arguments, eg: "this argument is a literal file specification, that
argument is a wildcard filespec, the next argument is an option, and after
that is non-filespec argument of arbitrary content".  I have no suggestions
on how to do this.

  I think that argument about "its very infrequency increases the danger of
someone forgetting about the need for it" is probably not very well
founded.  By extension, I should be constantly faced with mortal peril so I
remember how to deal with it.  Suboptimal, dude.  Nuisances should
generally be avoided.
--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

barmar@think.com (Barry Margolin) (03/02/91)

In article <KENW.91Mar1125012@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>> From: cks@hawkwind.utcs.toronto.edu (Chris Siebenmann)
>>  This is because you still
>>  need to be able to do have programs (like rm and mv, or their
>>  equivalents) work on files with globbing characters in their names. You
>>  also have to stick this quoting logic into anything that spits out
>>  filenames that are going to be handed to other programs; consider "find
>>  / <conditions> -exec operate {} \;" or its equivalent.
>  Good points.  Note, though, that this really applies to _all_ "special"
>characters, whether they are special to the shell or to the program.
>Consider, for instance, files named "-" and "--".  Or '"'.  Pathological
>examples abound :-).
>
>  Perhaps what is needed is a way to tell a program unambiguously about its
>arguments, eg: "this argument is a literal file specification, that
>argument is a wildcard filespec, the next argument is an option, and after
>that is non-filespec argument of arbitrary content".  I have no suggestions
>on how to do this.
>
>  I think that argument about "its very infrequency increases the danger of
>someone forgetting about the need for it" is probably not very well
>founded.

I agree.  On Multics, we decided that there were a few commands that need
to be able to take any file name that the kernel could handle.  The
"rename" and "delete" commands accept a "-name" option followed by a
pathname, and they ignore a loading hyphen and don't glob that parameter.
We felt that it was not important to make it easy to operate arbitrarily on
files with screwy names, but it was important to make it easy to *fix* the
names.

In the case of find-like commands, this syntax makes it simple: 

	find ... -exec rm -name {} \;
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/03/91)

In article <1991Mar2.035219.21905@Think.COM> barmar@think.com (Barry Margolin) writes:
>...
> I agree.  On Multics, we decided that there were a few commands that need
> to be able to take any file name that the kernel could handle.  The
> "rename" and "delete" commands accept a "-name" option followed by a
> pathname, and they ignore a loading hyphen and don't glob that parameter.
> We felt that it was not important to make it easy to operate arbitrarily on
> files with screwy names, but it was important to make it easy to *fix* the
> names.
> ...

  Fascinating.  I don't know Multics, although I've heard some very
interesting things about it.  That approach has depth -- it handles both
globbing and option recognition override.  But it wouldn't work with common
"dumb" implementations of shell globbing, would it?  What does Multics use?
And how does it handle such filenames if they contain spaces?

  Perhaps it would be reasonable to have globbing shells recognise
something comparable; a variation on the escape character perhaps, that
works on whole arguments or lines.  If the program parsers recognised the
same escapes, it might address these problems.
--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

barmar@think.com (Barry Margolin) (03/04/91)

In article <KENW.91Mar2194531@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>  Fascinating.  I don't know Multics, although I've heard some very
>interesting things about it.  That approach has depth -- it handles both
>globbing and option recognition override.  But it wouldn't work with common
>"dumb" implementations of shell globbing, would it?  What does Multics use?
>And how does it handle such filenames if they contain spaces?

No, it wouldn't help if globbing were done in the shell, unless the shell
specially recognized "-name"; such shells aren't common on Multics.  On
Multics, pathname globbing is done by commands, using a system call (for
efficiency -- directories aren't readable from user mode, so a user-mode
globber would have to copy all the file names out in order to glob them).

Spaces and other character used for shell syntax are quoted with
doublequotes.  The only character treated specially by the file system
calls is '>' (the equivalent of Unix '/').  In addition, the pathname
parsing library routing treat '<' and '::' specially ('<' is like Unix
'../', but it's processed syntactically rather than by looking in the file
system, and '::' is used for referencing files inside archives); there's no
quoting convention for these characters, but there are versions of rename
and delete that make the system call without first parsing the pathname.
--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar

crispin@csd.uwo.ca (Crispin Cowan) (03/08/91)

In article <43994@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>  It's too late for Unix to consider executable globbing.  Inertia of
>existing code seems to nail this down.
Quite true.

>  It doesn't seem to be reasonable in today's environment to expect a
>"new" operating environment to put/mandate globbing into executables:
>there is just too much Unix software out there that will break if/when
>it is "ported" to anything new.
This is not so clear.  If I want to create a new OS that mandates
program globbing, and still keep all my spiffy UNIX software, then I
just add a switch to the C compiler that means "do it the UNIX way,"
which causes the compiler to insert a crts (C Run-Time Startup) function
that does globbing on agrc,argv.  Simple.  Now you can have it both
ways, all you have to do is write a new operating system :-).

Crispin
-----
Crispin Cowan, CS grad student, University of Western Ontario
Work:  MC28-C, x3342 crispin@csd.uwo.ca
890 Elias St., London, Ontario, N5W 3P2,
"It might help to know that IA5String is the ever-popular 128-character
set that used to be called ASCII in some areas."--documentation on ASN.1,
	from the OSI seven-layer "bean-dip" standard.

gsarff@meph.UUCP (Gary Sarff) (03/09/91)

Sorry for the diatribe, but this posting struck a sensitive spot for me.

In article <5573:Feb2307:19:4491@kramden.acf.nyu.edu>, brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
>> Unless all possible commands fit into the 
>> 	command [flags] arg1..argN globbed_filesystem_arg
>> model, you're pretty much in trouble if you only have shell globbing.
>
>Why? You didn't provide any justification for this statement.
>
>> Program driven globbing doesn't force inconsistency, and certainly shell
>> globbing doesn't force consistency, as UNIX is more than happy to prove to
>> anyone using it.
>
>Why? You didn't provide any justification for this statement.
>
>Name one thing that you could accomplish by moving globbing into
>programs---that you couldn't accomplish at least as easily by modifying
>the shell. After all, you're complaining about the user interface, and
>the shell is the program responsible for that interface.

Ok, one thing, modifying the shell to know about all the argument
types/usages of all the  utilities you are going to run from it.  It seems to
me that somebody has to type all that information in so the shell will know,
it isn't a mind reader.  And it seems to me that the information may have to
be entered more than once for the different shells, csh,sh,ksh,etc. _AND_
updated every time some user wants to add a program, again possibly multiple
times for the different flavours of shell on the system.  Is this easier than
_not_ adding code to the shell and _not_ entering argument information,
but doing it _ONCE_ when the program/utility is written?

>
>Here are some disadvantages: 1. Programs (such as shell scripts) often
>invoke other programs, even with (gasp) arguments. As is, it suffices to
>use an occasional -- to turn off all argument processing. With globbing
>in every program, this would become much harder. 

Really?  (as an aside, from looking at many shell scripts, the escaping of
shell metacharacters is hardly "occasional", and is one reason that some
people have the opinion, as I believe it was Jim Giles who started this by
saying that shell scripts appeared to him to be line noise.)

Let's take an example.  The OS I develop on at work has a pattern searching
utility similar to grep, and all of our OS utilities call a library routine
to parse their command lines.  As an example, suppose we want to scan all of
our .c source files in a directory for the string inlcude*.h  (include
followed by some possibly empty sequence, followed by ".h") We have (our OS
called WMCS) the following command lines:

Example 1:
WMCS:
    wscan *.c include*.h
UNIX:
    grep include\*.h *.c

    Which is easier, or more intuitive?  I have to remember to escape the *.h
    field in UNIX.  Or I could go to the trouble of turning the globbing off
    for this one command and then turning it back on.  

Example 2:

    And what about the case where there are a _LOT_ of files in the
    directory.  A customer sent in a WREN VII 1.2 Gigabyte hard drive a few
    weeks back and wanted us to rescue the data off of it and prepare a new
    drive to send back to him with his data intact.  One directory on that
    drive contained over 24,000 files!  (Big! application) My command still
    works (even delivered the files in alphabetical order!) The best unix
    could do was print "Command line too long", because I had used * for file
    wildcarding.  A lot of help that is, it didn't even scan my files!  

Which is easier now?  Oh, the UNIX way, I should have thought of that and
used "find" or written a shell script on the command line and suffered the
process creation overhead as the thing loaded and ran grep 24,000 times,
silly me.  Which looks easier now?

Example 3:

    And what about this case, I want to do the same scan on the entire disk,
    so for the two OS's we get the following:

wscan /*/*.c include*.h

Unix:
probably something like a shell loop or using find
find / -name \* -exec grep include\*.h \{\} \;

Now which is easier?  Five backslashes for UNIX, the perfect environment for
developers?  Bah! 

Or of course remember to turn the shell globbing off for the find and then
turn it back on.  Again Bah!

>3. Programmers shouldn't be forced to manually handle
>standard conventions just to write a conventional program. Ever heard of
>modularity?

Oh, but programmers and users should be forced to remember which arguments
need to be escaped and which don't, or turn off globbing and turn it back on,
or have to write shell scripts to hide the backslashes and the quotes from
the light of day, and remember that they can't put too many files in one
directory or all the unix utilities that use shell globbing will not work in
that directory?

And this seems reasonable to you?

Every time I have asked which seems easier above, I meant for the user.
Many of us here in comp.arch have "users" of our computers, our OS's, our
software, and in turn we ourselves are users of software and OS's to get our
work done.  I for one have more important things to do, like improving the
kernel and utilities, to spare time remembering what should be quoted and
what should not.

>4. The system is slow enough as is without every application  scanning its
>arguments multiple times and opening up one directory after another.

Either the shell scans the directory or the utility does, how can one be
slower than the other?

---------------------------------------------------------------------------
Do memory page swapping to floppies?, I said, yes we can do that, but you 
haven't lived until you see our machine do swapping over a 1200 Baud modem
line, and keep on ticking.
     ..gsarff@meph.UUCP

peter@ficc.ferranti.com (Peter da Silva) (03/11/91)

In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
> WMCS:
>     wscan *.c include*.h
> UNIX:
>     grep include\*.h *.c

You mean "grep 'include.*.\h' *.c". You have to escape the . from grep.

> Which is easier now?  Oh, the UNIX way, I should have thought of that and
> used "find" or written a shell script on the command line and suffered the
> process creation overhead as the thing loaded and ran grep 24,000 times,
> silly me.  Which looks easier now?

ls (or find) | xargs ...

> find / -name \* -exec grep include\*.h \{\} \;

find / -print | xargs grep 'include.*\.h'

> Now which is easier?  Five backslashes for UNIX, the perfect environment for
> developers?  Bah! 

One backslash, and that having nothing to do with globbing.

> Oh, but programmers and users should be forced to remember which arguments
> need to be escaped

*all* arguments need to be escaped. Easy. The other way you have to remember
which commands which programmers remembered to include globbing.

> Every time I have asked which seems easier above, I meant for the user.

Me too.

> I for one have more important things to do, like improving the
> kernel and utilities, to spare time remembering what should be quoted and
> what should not.

Well then you're doing well with UNIX, where you don't have to remember any
such thing.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

torek@elf.ee.lbl.gov (Chris Torek) (03/11/91)

(Sigh, I had hoped my observation that `what works in one system is not
necessarily appropriate/best for all' would end this, but...:)

In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
>Sorry for the diatribe ...

I am not going to disagree with your general idea, but you really should
get the details right when you post something like this:

>Example 1:
>WMCS:
>    wscan *.c include*.h
>UNIX:
>    grep include\*.h *.c
>
>    Which is easier, or more intuitive?

Which is more powerful? :-)

Your grep command searches for `includ', followed by zero or more `e's
followed by any character except newline, followed by `h'.  You probably
meant

	grep 'include.*\.h' *.c

Grep-style regular expressions are more powerful than shell metacharacters,
at the expense of more complexity.  Note that, with back-references, grep
can be tricked into context-sensitive matches such as the one that pulled
the following words from the dictionary:

	% grep '^\(.*\)\1$' /usr/share/dict/web2
[actually, I cheated, the command I used was
 grep '^\(.*\)\1$' /usr/share/dict/web2 | rsh horse.ee.lbl.gov 'cat >tmp'
 since these boxes are running SunOS and do not have a web2.  I then deleted
 all but 27 words and ran these through `pr -3 -t -l9' to get a column sort.]

	atlatl			coco			furfur
	beriberi		couscous		gabgab
	bonbon			cuscus			gogo
	bulbul			dada			grigri
	cancan			dodo			grisgris
	caracara		dumdum			grugru
	chichi			enaena			guitguit
	chocho			eyey			gulgul
	chowchow		froufrou		juju

I will not argue grep's ease of use, but I would not give up much (if any)
of its power.

>Example 2:
>[directory with >24000 files] ... The best unix could do was print
>"Command line too long", because I had used * for file wildcarding.

This has been fixed in some Unixes, and will be fixed in others.

>Example 3:
>    ... I want to do the same scan on the entire disk,
>    so for the two OS's we get the following:
>
>wscan /*/*.c include*.h

(Does /*/ really mean `all directories' and not `the top level'?  If so,
how do you restrict the number of levels searched?)

>Unix:
>probably something like a shell loop or using find
>find / -name \* -exec grep include\*.h \{\} \;

Actually:

	find / -name '*.c' -exec grep 'include.*\.h' {} \;

or (given arbitrarily long argument lists):

	grep 'include.*\.h' `find / -name '*.c' -print`

Eschew backslash: quotes are your friends....
-- 
In-Real-Life: Chris Torek, Lawrence Berkeley Lab EE div (+1 415 486 5427)
Berkeley, CA		Domain:	torek@ee.lbl.gov

sef@kithrup.COM (Sean Eric Fagan) (03/11/91)

In article <10803@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>(Sigh, I had hoped my observation that `what works in one system is not
>necessarily appropriate/best for all' would end this, but...:)

It should also be redirected to the appropriate group (comp.os.misc).  But 
*noooo*, it keeps popping up in comp.arch.

>Actually:
>	find / -name '*.c' -exec grep 'include.*\.h' {} \;

I would suggest

	find / -name '*.c' -print | xargs grep 'include.*\.h' 

If one's unix has infinite space for the exec args, then just use the other
method Chris suggested.

-- 

Sean Eric Fagan, moderator, comp.std.unix.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/12/91)

In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
> >Name one thing that you could accomplish by moving globbing into
> >programs---that you couldn't accomplish at least as easily by modifying
> >the shell. After all, you're complaining about the user interface, and
> >the shell is the program responsible for that interface.
> Ok, one thing, modifying the shell to know about all the argument
> types/usages of all the  utilities you are going to run from it.

This has nothing to do with globbing. (The easiest way to do this under
current UNIXen is to have getopt() or parseargs() or your pet
argument-processing library recognize some switch, like -<ctrl-U>, to
report what it knows about the arguments recognized by the program. Then
the shell can do the rest. Even this would be simpler if the shell did
all argument processing to begin with, but it's too late for that
change.)

> >Here are some disadvantages: 1. Programs (such as shell scripts) often
> >invoke other programs, even with (gasp) arguments. As is, it suffices to
> >use an occasional -- to turn off all argument processing. With globbing
> >in every program, this would become much harder. 
> Really?

Yes, really. There are lots of examples of programs that exec other
programs, from /bin/nice on up, not to mention shell scripts. If they
don't glob their arguments, they're being inconsistent. If they do glob
their arguments, then they have to quote them again for the sub-program.
This is inefficient and IMNSFHO stupid.

> WMCS:
>     wscan *.c include*.h
> UNIX:
>     grep include\*.h *.c
>     Which is easier, or more intuitive?

*.c is globbed the same way in both examples; the difference between
wscan's include*.h and grep's 'include.*\.h' is just that grep has a
more powerful pattern-matching syntax. This pattern-matching has nothing
to do with globbing. Globbing is a certain type of pattern-matching
*upon existing files*.

> I have to remember to escape the *.h
>     field in UNIX.

Obviously if the pattern-matcher and globber recognize the same
characters, then you have to do *something* to say whether you're trying
to glob or to pattern-match. You may believe that it's better to pass
this information positionally than explicitly. In either case it's the
shell's problem.

>     And what about the case where there are a _LOT_ of files in the
>     directory.

I and many others have been pushing for utilities that understand
(null-terminated) lists of filenames passed through a descriptor. Then
as long as echo * (or echo0 *) works, you can pass arbitrarily many
filenames to any program. You can already do this with find, of course,
though its syntax is more powerful and hence less concise.

> Which is easier now?  Oh, the UNIX way, I should have thought of that and
> used "find" or written a shell script on the command line and suffered the
> process creation overhead as the thing loaded and ran grep 24,000 times,
> silly me.

It makes sense to me to say ``find every file in the current directory
and its subdirectories, and print the null-terminated list on output;
have the matcher read the null-terminated list from its input and search
for a pattern in each file in that list.''

  find . -print0 | match -i0 pattern

Hardly inefficient. Current systems don't have this, but xargs does the
job well enough.

You want a more concise syntax? Fine. Put it into your shell. That's
what shells are for. Different shells have different levels of support
for different types of globbing. In any case there is absolutely no
reason to stick the globbing logic into applications.

> find / -name \* -exec grep include\*.h \{\} \;

That seems an awfully complex way to write

  find / -exec grep 'include.h' '{}' \;

Oh, by the way: Should find glob its arguments or not? Well? Should it
pass the globbed arguments to grep or not? Should it quote the results
of its globbing?

> >3. Programmers shouldn't be forced to manually handle
> >standard conventions just to write a conventional program. Ever heard of
> >modularity?
> Oh, but programmers and users should be forced to remember which arguments
> need to be escaped and which don't,

They don't. You quote everything that you don't want your shell to
interpret. Done.

> and remember that they can't put too many files in one
> directory or all the unix utilities that use shell globbing will not work in
> that directory?

I agree that it is a problem that so many utilities refuse to take file
lists from a descriptor. This is a good reason to make those utilities
work better. This is not a reason to take globbing out of the shell.
echo * works perfectly in every csh I've seen and the newer sh's.

(For many applications it would make even more sense to have one stream
encode not only the file names but their contents. This would solve
problems like grep'ing through compressed files without making a
specialized grep that understands compression. The streams could be in
tar or cpio format, but those formats are both too complex and too
restricted for general use. See my forthcoming article in
comp.unix.shell.)

> And this seems reasonable to you?

Yes.

> >4. The system is slow enough as is without every application  scanning its
> >arguments multiple times and opening up one directory after another.
> Either the shell scans the directory or the utility does, how can one be
> slower than the other?

Again consider the case of applications with a syntax like that of
/bin/nice. Do they scan their arguments or not?

---Dan

jesup@cbmvax.commodore.com (Randell Jesup) (03/14/91)

In article <10803@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>(Sigh, I had hoped my observation that `what works in one system is not
>necessarily appropriate/best for all' would end this, but...:)

	Sigh.  Followups to comp.os.misc (again).

>I am not going to disagree with your general idea, but you really should
>get the details right when you post something like this:

>Grep-style regular expressions are more powerful than shell metacharacters,
>at the expense of more complexity. 

	Of course, things get worse with shell globbing if you want globbing
as powerful as RE's in grep (classes, negation, repetition, etc).  With
shell globbing, you start having to escape or quote even more than now.

>Eschew backslash: quotes are your friends....

	But they're still confusing to users, you just may not have to type
as many of them.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

fetter@cos.com (Bob Fetter) (03/15/91)

In article <2438@ria.ccs.uwo.ca> crispin@csd.uwo.ca (Crispin Cowan) writes:
>In article <43994@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>>  It doesn't seem to be reasonable in today's environment to expect a
>>"new" operating environment to put/mandate globbing into executables:
>>there is just too much Unix software out there that will break if/when
>>it is "ported" to anything new.
>This is not so clear.  If I want to create a new OS that mandates
>program globbing, and still keep all my spiffy UNIX software, then I
>just add a switch to the C compiler that means "do it the UNIX way,"
>which causes the compiler to insert a crts (C Run-Time Startup) function
>that does globbing on agrc,argv.  Simple.  Now you can have it both
>ways, all you have to do is write a new operating system :-).

  Uh, yeah, I (and probably a host of others) have done just that on
MS-DOS, but what it does is propigate the "blind" expansion of names.
My point of earlier is that it's too late to retrofit Unix software
into doing ->context directed<- wildcard expansion.

  -Bob-

gsarff@meph.UUCP (Gary Sarff) (03/15/91)

In article <10803@dog.ee.lbl.gov>, torek@elf.ee.lbl.gov (Chris Torek) writes:
>(Sigh, I had hoped my observation that `what works in one system is not
>necessarily appropriate/best for all' would end this, but...:)
>
>In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
>>Sorry for the diatribe ...
>
>I am not going to disagree with your general idea, but you really should
>get the details right when you post something like this:
>
> ... [excised]

thanks for fixing my mistakes, I am more of an occasional user of unix than
a continuous user.

>>Example 3:
>>    ... I want to do the same scan on the entire disk,
>>    so for the two OS's we get the following:
>>
>>wscan /*/*.c include*.h
>
>(Does /*/ really mean `all directories' and not `the top level'?  If so,
>how do you restrict the number of levels searched?)
>

Yes /*/ means all directories.  /.*/ means all subdirectories of wherever
your current directory is.  If you were going to put some limit on things you
could say /*.*.*/ say to limit the search to three subdirectories deep.
(note as may have become obvious just now we use "." as directory separators,
and "/" as leading and terminating chars).

To restrict searches there are command line switches, (I went into all this
in a post a month or more ago), such as :exclude=<filespec>,
:since=<date-time> :before=<date-time>.  In that posting I was merely
demonstrating that some poster's claim that letting the utilities handle
their own command lines would lead to instant chaos as an axiom.  _ALL_ of
the utilities that come with the OS that take file(s) as arguments take
file-lists as arguments, which include multiple wildcarded filespecs
separated by commas, and using the above mentioned, plus a few other,
switches.  And chaos has not ensued at this company or for our users.

---------------------------------------------------------------------------
Do memory page swapping to floppies?, I said, yes we can do that, but you 
haven't lived until you see our machine do swapping over a 1200 Baud modem
line, and keep on ticking.
     ..uplherc!wicat!sarek!gsarff

msp33327@uxa.cso.uiuc.edu (Michael S. Pereckas) (03/16/91)

In <00087@meph.UUCP> gsarff@meph.UUCP (Gary Sarff) writes:


>To restrict searches there are command line switches, (I went into all this
>in a post a month or more ago), such as :exclude=<filespec>,
>:since=<date-time> :before=<date-time>.  In that posting I was merely
>demonstrating that some poster's claim that letting the utilities handle
>their own command lines would lead to instant chaos as an axiom.  _ALL_ of
>the utilities that come with the OS that take file(s) as arguments take
>file-lists as arguments, which include multiple wildcarded filespecs
>separated by commas, and using the above mentioned, plus a few other,
>switches.  And chaos has not ensued at this company or for our users.

I think what worries people is that removing globbing from the shell
makes chaos easier.  Too many of us have seen systems where every
program does it differently, and most of them do it poorly (mess-dos,
for example).  Certainly we don't have to have chaos.  With strong
norms about how to deal with command lines, there wouldn't be chaos.
The chaos comes when no one tells you what the standard way is.  The
unix shells send a strong message about how to do it...but there are
other ways to clue in the programmers.
--


Michael Pereckas               * InterNet: m-pereckas@uiuc.edu *
just another student...          (CI$: 72311,3246)
Jargon Dept.: Decoupled Architecture---sounds like the aftermath of a tornado

throopw@sheol.UUCP (Wayne Throop) (03/18/91)

> fetter@cos.com (Bob Fetter)
> My point of earlier is that it's too late to retrofit Unix software
> into doing ->context directed<- wildcard expansion.

I agree that it's too late to retrofit context directed wildcard
expansion into individual Unix programs (not that it is impossible,
just prohibitive...).

But, that doesn't mean that it can't be done to "Unix" as an
environment.  Just that it has to be done co-operatively with the shell,
not co-operatively with individual commands. 

For example, an rc-like file full of context descriptions of common
commands along with a default context for commands not explicitly
mentioned.

And while I realize that many object that in such a scheme one would
have to "figure out" what each commands does globbing-wise to each
argument, I cannot agree that this is much of a problem, because 1) one
can always quote when in doubt, and 2) presumably you know the meaning
of the arguments you are passing to the command (if not, what do you
think you are doing?), and thus you know perfectly well what will be
done globbing-wise to these arguments, and 3) any doubt can be resolved
by asking the shell what IT thinks the context is. 
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

sef@kithrup.COM (Sean Eric Fagan) (03/18/91)

(Wrong newsgroup again!  I've posted this to c.o.m and redirected followups
there.  Again.  *sigh*)

In article <1406@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
>But, that doesn't mean that it can't be done to "Unix" as an
>environment.  Just that it has to be done co-operatively with the shell,
>not co-operatively with individual commands. 

Embos had a wonderful "feel" to it.  I really liked it.  Part of this was
accomplished by having an incredibly powerful parameterization interface,
which was (more or less) shared between both programming languages (well,
Pascal and C, at least 8-)) and the shell.  The shell had commands to set
types of parameters, and other qualifiers.  (Whether it was positional or
not, whether it was a file-list [in which case it got 'globbed'], etc.)

It would not be impossible to write an embos-like shell to unix, and then
create shell scripts using it.

Note, incidently, that, even though embos had the *ability* to do 'mv *.x
*.y', the rename utility did not do that, as it caused too many problems
(for people who came from unix, which was a large part of its intended
audience).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.