[comp.os.misc] Globbing

peter@ficc.ferranti.com (Peter da Silva) (02/27/91)

In article <19217@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> Unless all possible commands fit into the 

> 	command [flags] arg1..argN globbed_filesystem_arg

> model, you're pretty much in trouble if you only have shell globbing.

Actually, the model is less restrictive than that. It's more like:

	command [flags] args-possibly-globbed [flags] more-globbed-args ...
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (02/28/91)

Hey, would you please direct your followups out of comp.arch (like I'm doing
here)?

In article <19336@cbmvax.commodore.com> daveh@cbmvax.commodore.com (Dave Haynie) writes:
> Because it's obvious.  If I have, for example, two sets of globbed filesystem
> arguments, the program can't determine which of the two sets an arbitrary
> expanded file name belong to.

foo -a this that *.c -b this that *.o -c this that *.1

> If the program does the globbing, dealing
> with a command of the form "foo A* B*" is trivial.  Shell globbing won't
> allow it.

Proof by existence, "find".

> >Here are some disadvantages: 1. Programs (such as shell scripts) often
> >invoke other programs, even with (gasp) arguments. 

> Sure they do.  Works just great under AmigaDOS, where programs glob.

No, it doesn't. The usual AmigaDOS subshell environment might be so
screwed up that you didn't notice, but the magic I have on occasion had
to do to get the right arguments to the right programs in Browser (yes,
it's my own program... but it's proven moderately popular even for
people using 2.0 (which surprised me... I use the 2.0 workbench myself))
is sufficiently painful that "works great" is not an adequate description.

It works, but you pretty much have to be prepared to reverse-engineer
quoting and hope the program you call doesn't do something weird. Then you
get to the problem that command line options are indistinguishable from
filenames...

> And if you don't like the wildcard set, you can change them in one
> place, and every command line, dialog box, grep-style program, etc. gets
> the change.

Yes, shared libraries are great. And having a globbing library is great.
But that's another point.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/01/91)

In article <43994@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>   Ok, so what is this?  Well, if the shell does globbing, ok, fine.
> If somebody decides to code

> 	execl("my_copy","my_copy","*.c","dest_dir/",NULL);

> then why not have 'my_copy' understand globbing?

What happens when "*.c" is the actual file name under consideration?

The biggest advantage to shell globbing for me is that I *know* that
each argument I pass in argv is damn well going to stay one argument
once it gets to the program I'm calling.

>   Were software to be written in this manner, wouldn't this make the
> entire debate happening here moot?  Those folks who advocate having
> executables handle globbing are free (like the folks who wrote find)
> to put it in.

And the folks who expect programs to take arguments as they're handed,
damnit, will lose out. As we do on every operating system other than
UNIX. I've written this sort of code for MS-DOS, VAX/VMS, and AmigaDOS,
and I really really hate having to special-case all the quoting. Oh sure,
it's easy enough to get right in a script, but when my program is going
to be handed an arbitrary program name, and a list of file names...

Nope. Making programs glob command line arguments is like having them
handle erase and kill processing, or serial port interrupts, or display
refresh, or expose events.

Come on, UNIX started a revolution by making it easier for application
writers to get these sort of details right. Let's not make huge steps
back into the days when everyone did it themselves and most of them got
it wrong!
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/03/91)

In article <19375@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes:
> 	This imposes a very specific syntax on commands.  What do you do
> in a keyword-oriented system?  Also, allowing options anywhere is far
> more regular and less confusing to users.

A straight keyword oriented system is a loss anyway, because the chances of
file names colliding with options increases greatly. It's like languages with
no reserved words. To make it work you need to mark the keywords somehow.

> 	Tell me how I invoke multiple programs at once with the same
> argument list.

for i in prog1 prog2 ...; do $i arg list; done
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jesup@cbmvax.commodore.com (Randell Jesup) (03/05/91)

In article <X.U9MZ7@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>> 	Tell me how I invoke multiple programs at once with the same
>> argument list.
>
>for i in prog1 prog2 ...; do $i arg list; done

	Like I said, you have to put it in a variable first (and in this case
is generated by a command ("for"), which would do the globbing before 
stuffing the variable).

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/07/91)

   Here is an excerpt from a side conversation via mail:

>...the fact that UNIX *programs* treat arguments beginning
> with "-" specially has nothing to do with globbing; globbing treats
> such arguments exactly like all others.
 
  Of course.  But the very fact that globbing takes place, coupled with the
possibility that files may have names starting with "-", means that by the
time a program gets its command line, it may have no way of knowing _for_
_sure_ whether an argument is a user-entered command line option or a
globbing- generated filename.  And that ambiguity is a result of the
logical interaction between globbing and argument parsing in an environment
where filenames can contain virtually any character.

  One needs, I think, some syntactical rules which are followed by both
globbing and parsing, so they can work properly together.  We have those
now, in fact; but I don't think they have been sufficiently developed.  As
has been pointed out before, a globbing mechanism that understood command
syntax could be very powerful -- although are some difficulties with that
approach, too.

  Other people have mentioned that one can take advantage of Unix's file
specification semantics (the "./" trick) to avoid some ambiguities.
Certainly this will work -- in Unix -- but I'm trying to to talk "globbing
theory" in as general a context as possible, without skirting issues or
presuming a file semantics context.

  Peter, you asked me to come up with other examples besides the mv/rename
one.  To be really meaningful, such a command would need to have multiple
separate wildcard file specifications, all of which need to be globbed by
the shell.  For there to be a point to this, there would probably need to
be some relationship between the files on the two lists.

  It wasn't easy; they don't exist now, as far as I know -- maybe
due to a chicken-and-egg precedence problem -- but here are a couple:

	difff [args] spec1 [args] spec2 [moreargs (why not?)]

  Such a program would compare all the files/objects referred to on "spec1"
to corresponding files in "spec2".  The args may contain expressions
controlling how the comparison should be done and how correspondence is
determined.  Now envision this occurring in an environment where filenames
may contain spaces, leading "-"s, and other pathological characters.

  Another example might be one wherein one set of files is used to process
another set of files -- a glorified 'make' or 'xch', perhaps.  The syntax
would probably be comparable; I don't plan to go into it in detail.  And I
don't plan to defend it from arguments that it could be done another way.
That's not my point.  My point is the difficulty in doing it _this_ way.

   What if globbing were turned off by default, and one escaped or quoted
an argument to make it glob?  I have the impression that that has been
tried and abandoned.  Does anyone know why?

  Parsing difficulties and the loss of command line information would be
greatly lessened if globbed arguments were passed as a list data object
rather than as in-line text.  There _are_, after all, syntaxes for
specifying arbitrary sets of characters in delimited strings; sed and vi do
it.  All we really need to do is chose one.
 
  Even though my exerience is (surprise, surprise ;-)) primarily in non-
shell-globbing environments, I think I like shell globbing better.  One can
rely on it always being there, as opposed to environments where programs
may or may not have availed themselves of it.  And your point about being
able to hand a program a literal filename, and not have the program try to
expand it, is a good one.

  Some systems provide shell parsing and program globbing; some the
reverse.  Like socialism and free enterprise, each has its absolute
adherents.  Me, I want it all :-).

  But maybe it's pointless to debate the niceties of a command line
interface, when it may soon be that only software and programmers will use
them, and everyone else will use GUIs.

/kenw





--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

peter@ficc.ferranti.com (Peter da Silva) (03/08/91)

In article <KENW.91Mar6231308@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>   Peter, you asked me to come up with other examples besides the mv/rename
> one. [...] It wasn't easy; they don't exist now, as far as I know -- maybe
> due to a chicken-and-egg precedence problem

Look for examples on VMS, DOS, AmigaDOS, etc. No catch-22 there.

> -- but here are a couple:

> 	difff [args] spec1 [args] spec2 [moreargs (why not?)]

Amazing. A second case. I'm not sure I'd want to do this, just out of concern
for the possibility of unmatched files in spec1 or extra files in spec2. It'd
be possible to get the equivalent functionality even without a more complex
command line, but you said you don't want to consider that. Fair enough.

>   Another example might be one wherein one set of files is used to process
> another set of files -- a glorified 'make' or 'xch', perhaps.

Now you're really pushing it. I think this, if anything, illustrates that
this particular case (two corresponding lists of file names) is a pretty
damn rare one, and shell globbing *can* deal with it... with a slightly
different syntax. Not within your rules for this contest, but a reasonable
solution.

>    What if globbing were turned off by default, and one escaped or quoted
> an argument to make it glob?  I have the impression that that has been
> tried and abandoned.  Does anyone know why?

The default action you want is to glob. For a current shell that does this,
try "tcl".

>   But maybe it's pointless to debate the niceties of a command line
> interface, when it may soon be that only software and programmers will use
> them, and everyone else will use GUIs.

I don't know. As soon as a command line oriented scripting language showed
up on the Mac it took the Mac world by storm. Here I'm talking about
Hypercard...
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jlg@lanl.gov (Jim Giles) (03/09/91)

From article <EIY96WH@xds13.ferranti.com>, by peter@ficc.ferranti.com (Peter da Silva):
> In article <KENW.91Mar6231308@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>>   Peter, you asked me to come up with other examples besides the mv/rename
>> one. [...] It wasn't easy; they don't exist now, as far as I know -- maybe
>> due to a chicken-and-egg precedence problem
> 
> Look for examples on VMS, DOS, AmigaDOS, etc. No catch-22 there.


It's not hard to come up with enormous numbers of examples where the
command language would be better if globbing were not done automatically
by the shell.  All you have to do is look at the assumptions that the
shell is making when it automatically globs and consider cases that
violate such assumptions.

1) The case everyone thought of first: you want globbing to produce
   _two_ distinct lists that correspond.  This is the mv/rename
   example that everyone has been discussing.

2) You want an argument globbed, but _not_ in the context of the
   local filespace.  For example, suppose you have an external
   mass storage device that you periodically save files from your
   local filesystem to or periodically load data from.  You might
   want the syntax to look like "save <pattern>" and "get <pattern>".
   But, you can't do this if the shell automatically globs because
   the <pattern> is always globbed in the _local_ filespace - but
   for the 'get' command, you want it to be globbed in the context
   of the external filesystem.  With the increasing importance of
   distributed processing, the ability to glob file arguments in
   _non-local_ contexts may be very valuable.

3) You want the argument globbed, but _not_ in the context of a
   filespace at all.  After all, globbing is just a pattern matching
   facility - there are any number of lists of data which could
   benefit from making such a pattern match available for them.
   For example, you could have a version of 'ls' which allowed
   globbing on the owner or group fields: "ls -l *.p -group c-*"
   would list all '.p' files which are owned by someone in 'c'
   division (according to our naming convention for groups).

It is no trick to come up with numerous examples of globbing that fit
each of these three categories - or more than one at a time!  For
example, you may want one argument globbed as an account number and
another argument to generate a corresponding list of (say) mail
addresses based on the matched part of the account - a mix of
categories (1) and (3) above.

Note also that quoting and escape characters are _not_ sufficient to
perform the above functionalities in the present 'automatic glob'
environments.  Quoting is supposed to prevent an argument from
being globbed - but you _want_ all these arguments to be globbed,
it's just that you want it done in a different context.

J. Giles

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/09/91)

In article <EIY96WH@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

   In article <KENW.91Mar6231308@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca
	(Ken Wallewein) writes:
   >   Peter, you asked me to come up with other examples besides the mv/rename
   > one. [...] It wasn't easy; they don't exist now, as far as I know -- maybe
   > due to a chicken-and-egg precedence problem

   Look for examples on VMS, DOS, AmigaDOS, etc. No catch-22 there.

I'm familiar with VMS and AmigaDOS; I run one and own the other.  But I
can't think of any commands on either that use multiple separately globbed
arguments. Not: I'm not saying wildcard; I'm saying globbed.  In the case
of RENAME, for example, the second wildcard argument is an _output_
filespec, and as such is not globbed.  Which did you have in mind?

   > -- but here are a couple:

   > 	difff [args] spec1 [args] spec2 [moreargs (why not?)]

   Amazing. A second case. I'm not sure I'd want to do this, just out of concern
   for the possibility of unmatched files in spec1 or extra files in spec2. It'd
   be possible to get the equivalent functionality even without a more complex
   command line, but you said you don't want to consider that. Fair enough.

Hey, I design 'em.  You build 'em.  :-) 

I though of that.  Such a program, to be useful, would need to be able to
handle unmatched files, as I implied in the description of the arguments.

   >   Another example might be one wherein one set of files is used to process
   > another set of files -- a glorified 'make' or 'xch', perhaps.

   Now you're really pushing it. I think this, if anything, illustrates that
   this particular case (two corresponding lists of file names) is a pretty
   damn rare one, and shell globbing *can* deal with it... with a slightly
   different syntax. Not within your rules for this contest, but a reasonable
   solution.

"Slightly" different?  _Existing_ shell globbing?  Certainly, a
sufficiently sophisticated globbing syntax enxironment could do it.  If
your solution is within the spirit of this discussion, I'm curious.

   >    What if globbing were turned off by default, and one escaped or quoted
   > an argument to make it glob?  I have the impression that that has been
   > tried and abandoned.  Does anyone know why?

   The default action you want is to glob. For a current shell that does this,
   try "tcl".

I thought tcl was sort of a compiled-in command language processor -- but I
confess far more curiosity than knowledge.  Please tell me more.
--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

peter@ficc.ferranti.com (Peter da Silva) (03/11/91)

In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
> WMCS:
>     wscan *.c include*.h
> UNIX:
>     grep include\*.h *.c

You mean "grep 'include.*.\h' *.c". You have to escape the . from grep.

> Which is easier now?  Oh, the UNIX way, I should have thought of that and
> used "find" or written a shell script on the command line and suffered the
> process creation overhead as the thing loaded and ran grep 24,000 times,
> silly me.  Which looks easier now?

ls (or find) | xargs ...

> find / -name \* -exec grep include\*.h \{\} \;

find / -print | xargs grep 'include.*\.h'

> Now which is easier?  Five backslashes for UNIX, the perfect environment for
> developers?  Bah! 

One backslash, and that having nothing to do with globbing.

> Oh, but programmers and users should be forced to remember which arguments
> need to be escaped

*all* arguments need to be escaped. Easy. The other way you have to remember
which commands which programmers remembered to include globbing.

> Every time I have asked which seems easier above, I meant for the user.

Me too.

> I for one have more important things to do, like improving the
> kernel and utilities, to spare time remembering what should be quoted and
> what should not.

Well then you're doing well with UNIX, where you don't have to remember any
such thing.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

sef@kithrup.COM (Sean Eric Fagan) (03/11/91)

In article <10803@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>(Sigh, I had hoped my observation that `what works in one system is not
>necessarily appropriate/best for all' would end this, but...:)

It should also be redirected to the appropriate group (comp.os.misc).  But 
*noooo*, it keeps popping up in comp.arch.

>Actually:
>	find / -name '*.c' -exec grep 'include.*\.h' {} \;

I would suggest

	find / -name '*.c' -print | xargs grep 'include.*\.h' 

If one's unix has infinite space for the exec args, then just use the other
method Chris suggested.

-- 

Sean Eric Fagan, moderator, comp.std.unix.

peter@ficc.ferranti.com (Peter da Silva) (03/11/91)

In article <SF-95G3@xds13.ferranti.com> I said:
> You mean "grep 'include.*.\h' *.c". You have to escape the . from grep.

Of course, I mean "grep 'include.*\.h' *.c". :-<
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

davidsen@crdos1.crd.ge.COM (Wm E Davidsen Jr) (03/11/91)

In article <125251@uunet.UU.NET> sef@kithrup.COM (Sean Eric Fagan) writes:

| It should also be redirected to the appropriate group (comp.os.misc).  But 
| *noooo*, it keeps popping up in comp.arch.

  It has gone by once in comp.unix.shell, which I thought was the right
place for it.
-- 
bill davidsen	(davidsen@crdos1.crd.GE.COM -or- uunet!crdgw1!crdos1!davidsen)
        "Most of the VAX instructions are in microcode,
         but halt and no-op are in hardware for efficiency"

peter@ficc.ferranti.com (Peter da Silva) (03/12/91)

In article <17097@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> 2) You want an argument globbed, but _not_ in the context of the
>    local filespace.
> 3) You want the argument globbed, but _not_ in the context of a
>    filespace at all.

No problem. *just quote the pattern that you're passing to the program*.
Since the quoting and shell globbing are uniform and consistent, the result
is clean and consistent. Because the globbing isn't in the local file space,
the program can't glob them using the "standard globbing function" anyway,
so it's going to have to be treated specially anyway.

> It is no trick to come up with numerous examples of globbing that fit
> each of these three categories - or more than one at a time!

Yes, but only category one is a case where you want globbing in the
local namespace under more than one context.  If you're using a different
language you have to deal with possible conflict with filenames, standard
command syntax, etc...

Let's deal with an imaginary machine where the command syntax is a set
of keywords and file names separated by spaces or slashes, with keywords
preceded by slashes, and you want to search for "accounts receivable" on
a remote system where filenames include slashes and spaces:

search /pat=accounts receivable/opt/inp=xenix::/usr/accounting/*.*.

oops:

search /pat="accounts receivable"/opt/inp="xenix::/usr/accounting/*.*".

You have to have quoting rules for all this anyway. In practice, you have
to assume that non-file-name arguments and foreign file name arguments
need to be quoted. So why get all bent out of shape over another reason
to do what you need to be doing anyway?
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/12/91)

In article <00085@meph.UUCP> gsarff@meph.UUCP writes:
> >Name one thing that you could accomplish by moving globbing into
> >programs---that you couldn't accomplish at least as easily by modifying
> >the shell. After all, you're complaining about the user interface, and
> >the shell is the program responsible for that interface.
> Ok, one thing, modifying the shell to know about all the argument
> types/usages of all the  utilities you are going to run from it.

This has nothing to do with globbing. (The easiest way to do this under
current UNIXen is to have getopt() or parseargs() or your pet
argument-processing library recognize some switch, like -<ctrl-U>, to
report what it knows about the arguments recognized by the program. Then
the shell can do the rest. Even this would be simpler if the shell did
all argument processing to begin with, but it's too late for that
change.)

> >Here are some disadvantages: 1. Programs (such as shell scripts) often
> >invoke other programs, even with (gasp) arguments. As is, it suffices to
> >use an occasional -- to turn off all argument processing. With globbing
> >in every program, this would become much harder. 
> Really?

Yes, really. There are lots of examples of programs that exec other
programs, from /bin/nice on up, not to mention shell scripts. If they
don't glob their arguments, they're being inconsistent. If they do glob
their arguments, then they have to quote them again for the sub-program.
This is inefficient and IMNSFHO stupid.

> WMCS:
>     wscan *.c include*.h
> UNIX:
>     grep include\*.h *.c
>     Which is easier, or more intuitive?

*.c is globbed the same way in both examples; the difference between
wscan's include*.h and grep's 'include.*\.h' is just that grep has a
more powerful pattern-matching syntax. This pattern-matching has nothing
to do with globbing. Globbing is a certain type of pattern-matching
*upon existing files*.

> I have to remember to escape the *.h
>     field in UNIX.

Obviously if the pattern-matcher and globber recognize the same
characters, then you have to do *something* to say whether you're trying
to glob or to pattern-match. You may believe that it's better to pass
this information positionally than explicitly. In either case it's the
shell's problem.

>     And what about the case where there are a _LOT_ of files in the
>     directory.

I and many others have been pushing for utilities that understand
(null-terminated) lists of filenames passed through a descriptor. Then
as long as echo * (or echo0 *) works, you can pass arbitrarily many
filenames to any program. You can already do this with find, of course,
though its syntax is more powerful and hence less concise.

> Which is easier now?  Oh, the UNIX way, I should have thought of that and
> used "find" or written a shell script on the command line and suffered the
> process creation overhead as the thing loaded and ran grep 24,000 times,
> silly me.

It makes sense to me to say ``find every file in the current directory
and its subdirectories, and print the null-terminated list on output;
have the matcher read the null-terminated list from its input and search
for a pattern in each file in that list.''

  find . -print0 | match -i0 pattern

Hardly inefficient. Current systems don't have this, but xargs does the
job well enough.

You want a more concise syntax? Fine. Put it into your shell. That's
what shells are for. Different shells have different levels of support
for different types of globbing. In any case there is absolutely no
reason to stick the globbing logic into applications.

> find / -name \* -exec grep include\*.h \{\} \;

That seems an awfully complex way to write

  find / -exec grep 'include.h' '{}' \;

Oh, by the way: Should find glob its arguments or not? Well? Should it
pass the globbed arguments to grep or not? Should it quote the results
of its globbing?

> >3. Programmers shouldn't be forced to manually handle
> >standard conventions just to write a conventional program. Ever heard of
> >modularity?
> Oh, but programmers and users should be forced to remember which arguments
> need to be escaped and which don't,

They don't. You quote everything that you don't want your shell to
interpret. Done.

> and remember that they can't put too many files in one
> directory or all the unix utilities that use shell globbing will not work in
> that directory?

I agree that it is a problem that so many utilities refuse to take file
lists from a descriptor. This is a good reason to make those utilities
work better. This is not a reason to take globbing out of the shell.
echo * works perfectly in every csh I've seen and the newer sh's.

(For many applications it would make even more sense to have one stream
encode not only the file names but their contents. This would solve
problems like grep'ing through compressed files without making a
specialized grep that understands compression. The streams could be in
tar or cpio format, but those formats are both too complex and too
restricted for general use. See my forthcoming article in
comp.unix.shell.)

> And this seems reasonable to you?

Yes.

> >4. The system is slow enough as is without every application  scanning its
> >arguments multiple times and opening up one directory after another.
> Either the shell scans the directory or the utility does, how can one be
> slower than the other?

Again consider the case of applications with a syntax like that of
/bin/nice. Do they scan their arguments or not?

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/12/91)

In article <17097@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
> All you have to do is look at the assumptions that the
> shell is making when it automatically globs and consider cases that
> violate such assumptions.
> 1) The case everyone thought of first: you want globbing to produce
>    _two_ distinct lists that correspond.  This is the mv/rename
>    example that everyone has been discussing.

And that is not globbing.

> 2) You want an argument globbed, but _not_ in the context of the
>    local filespace.

That is not globbing.

> 3) You want the argument globbed, but _not_ in the context of a
>    filespace at all.

That is not globbing.

> After all, globbing is just a pattern matching
>    facility

It is not just *any* pattern-matching facility. It is a *specific*
pattern-matching facility, namely replacing a pattern with a (sorted)
list of filenames matching that pattern.

You have shown three cases where it might be useful to have a facility
more powerful than globbing. Fine, go ahead and write such a facility.
If people need it then they'll use it. (mvm has had reasonble success,
and people might like a shell with similar syntax.)

The current argument is whether it makes sense to put globbing into
separate applications rather than the shell. Your examples are
absolutely, totally irrelevant to that argument. They don't even support
your religious arguments against UNIX, as anyone who's used mvm can
attest.

---Dan

frank@grep.co.uk (Frank Wales) (03/13/91)

In article <125251@uunet.UU.NET> sef@kithrup.COM (Sean Eric Fagan) writes:
>It should also be redirected to the appropriate group (comp.os.misc).

Done.  :-)

>In article <10803@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>>Actually:
>>	find / -name '*.c' -exec grep 'include.*\.h' {} \;
>
>I would suggest
>
>	find / -name '*.c' -print | xargs grep 'include.*\.h' 

So would I.  In addition, though (and I don't have the original article
around to find out if the filenames are desired, but if they are...),
I'd also pass /dev/null as an arg to grep to force it to emit them, in
the unlikely event that xargs invokes it with zero or one filenames.
--
Frank Wales, Grep Limited,             [frank@grep.co.uk<->uunet!grep!frank]
Kirkfields Business Centre, Kirk Lane, LEEDS, UK, LS19 7LX. (+44) 532 500303

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/13/91)

In article <5946:Mar1122:11:0691@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:

>    ...
>    This has nothing to do with globbing...
>    ...
>    This pattern-matching has nothing to do with globbing....
>    ...

  I thought this topic was about dead, but people are still making what to
me are new and very valid points -- including yours, Dan.

  But I still see people saying something "has nothing to do with globbing"
when they probably mean something more like "isn't a direct result of
globbing".  Please, folks, let's not forget that the very existence of
shell globbing in an environment has a lot of indirect repercussions which
cannot reasonably be ignored.

  And while we're at it, let's remember that we're not just talking about
Unix here.

  I'm beginning to think that what we _really_ need is the Object Oriented
Operating System.  It would certainly make this discussion obsolete.
Anybody want to play with that one?

--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (03/13/91)

In article <W_-99S4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

   In article <17097@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
   > 2) You want an argument globbed, but _not_ in the context of the
   >    local filespace.
   > 3) You want the argument globbed, but _not_ in the context of a
   >    filespace at all.

   No problem. *just quote the pattern that you're passing to the program*.
   Since the quoting and shell globbing are uniform and consistent, the result
   is clean and consistent. Because the globbing isn't in the local file space,
   the program can't glob them using the "standard globbing function" anyway,
   so it's going to have to be treated specially anyway.

Actually, if your "standard globbing function" is designed correctly,
you _can_ use it in the latter two cases. Whether you call this use
"globbing" or not is immaterial.

The advantage of providing a single (preferably shared) library to
deal with regular expressions/globbing is that all tools that use the
library wind up handling the same regular expressions. If you insist
on having a shell that globs on a system that provides a standard
mechanism for globbing, it's still a win for the shell to use that
mechanism - that means the expressions expanded by the shell will
match the ones used by your other tools.

	<mike
--
When all our dreams lay deformed and dead		Mike Meyer
We'll be two radioactive dancers			mwm@pa.dec.com
Spinning in different directions			decwrl!mwm
And my love for you will be reduced to powder

jlg@cochiti.lanl.gov (Jim Giles) (03/13/91)

In article <W_-99S4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter
da Silva) writes:
  In article <17097@lanl.gov> jlg@lanl.gov (Jim Giles) writes:
   > 2) You want an argument globbed, but _not_ in the context of the
   >    local filespace.
   > 3) You want the argument globbed, but _not_ in the context of a
   >    filespace at all.

   No problem. *just quote the pattern that you're passing to the program*.
   Since the quoting and shell globbing are uniform and consistent, the result
   is clean and consistent. Because the globbing isn't in the local file space,
   the program can't glob them using the "standard globbing function" anyway,
   so it's going to have to be treated specially anyway.

Please read the original post again.  The mnemonic purpose of quoting
is to _prevent_ globbing.  But, in _both_ these cases, I _WANT_ globbing.
I just want it done in a different context.  Further mnemonic violence
occurs when I want to pass a special character through to the other
context - _your_ "solution" requires multiple quoting - _BLETCH!!_

Both the tool and the user _know_ what the arguments to a given tool
mean.  The shell _DOES_NOT_.  It is foolish, therefore, to have the
shell do the globbing.  Period.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (03/13/91)

In article <MWM.91Mar12130952@raven.pa.dec.com> mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
> The advantage of providing a single (preferably shared) library to
> deal with regular expressions/globbing is that all tools that use the
> library wind up handling the same regular expressions.

I agree, that's a good idea. How do you get the programmers to glob in
the places you want in the first place? Even on systems that support
globbing libraries, you run into this... and unlike shell globbing
you don't get the option of quoting them to make them do what you want.

Also:

% ls foo*
foo.new		foo.old
% diff foo*
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jesup@cbmvax.commodore.com (Randell Jesup) (03/14/91)

In article <10803@dog.ee.lbl.gov> torek@elf.ee.lbl.gov (Chris Torek) writes:
>(Sigh, I had hoped my observation that `what works in one system is not
>necessarily appropriate/best for all' would end this, but...:)

	Sigh.  Followups to comp.os.misc (again).

>I am not going to disagree with your general idea, but you really should
>get the details right when you post something like this:

>Grep-style regular expressions are more powerful than shell metacharacters,
>at the expense of more complexity. 

	Of course, things get worse with shell globbing if you want globbing
as powerful as RE's in grep (classes, negation, repetition, etc).  With
shell globbing, you start having to escape or quote even more than now.

>Eschew backslash: quotes are your friends....

	But they're still confusing to users, you just may not have to type
as many of them.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

fetter@cos.com (Bob Fetter) (03/14/91)

In article <5946:Mar1122:11:0691@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
>
>I and many others have been pushing for utilities that understand
>(null-terminated) lists of filenames passed through a descriptor. Then
>as long as echo * (or echo0 *) works, you can pass arbitrarily many
>filenames to any program. You can already do this with find, of course,
>though its syntax is more powerful and hence less concise.

  Independent of the other items raised in this thread, what is meant
by arguments "passed through a descriptor"?

  Having worked on/with systems with descriptor *paired* argument lists
(case in point Multics, where the argv[] pointer array was coupled with
a descriptor[] pointer array, the paired items giving the address of the
data along with the type), I can perhaps *intuit* what you mean here.

  Still, what do you mean here?

  If you are proposing having type information tied with argument lists
and having this put into the Unix environment, well, it would seem that
this would bust one heck of a lot MORE software than the innoculous
suggestion that globbing be handled by programs also.  I could go on
about what type binding would involve in argument parsing routines
and how it reaches the point where generic library routines are involved
(a-la the globbing "debate" going on), but it would be a digression
until I understand the point your making here.

  ???

  -Bob-

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/14/91)

In article <44190@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
> In article <5946:Mar1122:11:0691@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
> >I and many others have been pushing for utilities that understand
> >(null-terminated) lists of filenames passed through a descriptor. Then
> >as long as echo * (or echo0 *) works, you can pass arbitrarily many
> >filenames to any program. You can already do this with find, of course,
> >though its syntax is more powerful and hence less concise.
>   Independent of the other items raised in this thread, what is meant
> by arguments "passed through a descriptor"?

Exactly what happens to the filename arguments in, e.g.,

  find / -name core -print0 | xargs -0 rm

They are passed from find to xargs through a descriptor. Is that clear
enough?

>   Having worked on/with systems with descriptor *paired* argument lists
> (case in point Multics, where the argv[] pointer array was coupled with
> a descriptor[] pointer array,

No, no, no, no. That's exactly the wrong model for this sort of problem.
One part of the ``UNIX philosophy''---namely pipelines---says that you
should work with data bit by bit if possible, rather than collecting it
all together, storing it somewhere, and processing in stages.

I'm saying that this can be fruitfully applied to filename arguments.
Instead of collecting all the arguments together in one argv[] line,
pass them through a descriptor in some sensible format. This eliminates
the need for xargs and neatly solves various problems.

There are already many programs---notably various make versions---that
already take file lists from input. Hopefully the trend will continue.

---Dan

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (03/15/91)

   I agree, that's a good idea. How do you get the programmers to glob in
   the places you want in the first place?

You don't. Then again, shell globbing only solves one such case, and
makes dealing with the rest that much harder.

   Even on systems that support globbing libraries, you run into this...
   and unlike shell globbing you don't get the option of quoting them to
   make them do what you want.

Not true. A shell metacharacter that says "Please glob this argument
for me" is possible. In fact, it can be kludged into most modern unix
by tweaking ls to glob, and doing something like $(ls arg). In a
non-globbing shell, I'd suggest *(arg) as the "quote" to indicate
that this should be globbed. Better yet, *(arg,dir) to indicate
globbing against dir instead of the current directory.

   Also:

   % ls foo*
   foo.new		foo.old
   % diff foo*

Don't forget:

% diff foo*
diff: two filename arguments required
% ls foo*
foo.new		foo.old		foo.saved

This is a case where you want a different mechanism than globbing. To
wit:

% diff foo.{new,old}

This also means you don't depend on the files names to get the
arguments in the right order.

	<mike
--
Look at my hopes,					Mike Meyer
Look at my dreams.					mwm@pa.dec.com
The currency we've spent,				decwrl!mwm
I love you. You pay my rent.

peter@ficc.ferranti.com (Peter da Silva) (03/16/91)

In article <KENW.91Mar8155622@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
> I'm familiar with VMS and AmigaDOS; I run one and own the other.  But I
> can't think of any commands on either that use multiple separately globbed
> arguments. Not: I'm not saying wildcard; I'm saying globbed.  In the case
> of RENAME, for example, the second wildcard argument is an _output_
> filespec, and as such is not globbed.  Which did you have in mind?

That's my point. You can't blame UNIX there... plenty of prior art and
you're looking at a damned rare case.

> "Slightly" different?  _Existing_ shell globbing?  Certainly, a
> sufficiently sophisticated globbing syntax enxironment could do it.  If
> your solution is within the spirit of this discussion, I'm curious.

My solution is don't use globbing for cases where you're doing something
more sophisticated than globbing.

>    The default action you want is to glob. For a current shell that does this,
>    try "tcl".

> I thought tcl was sort of a compiled-in command language processor -- but I
> confess far more curiosity than knowledge.  Please tell me more.

Well, the default program you compile it in is an interactive shell.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/16/91)

In article <17602@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> Please read the original post again.  The mnemonic purpose of quoting
> is to _prevent_ globbing.  But, in _both_ these cases, I _WANT_ globbing.

No, you want wildcarding in another context. That's not globbing.

> I just want it done in a different context.  Further mnemonic violence
> occurs when I want to pass a special character through to the other
> context - _your_ "solution" requires multiple quoting - _BLETCH!!_

As in "grep 'include.*\.h' *.c"? But your solution requires knowing the
behaviour of the program you're passing the argument to the other context.
And when I'm calling it directly from another program (say, "mail") I
have to add a bunch of quoting I'm not putting in now... and *that* is a
security risk... one I've exploited in older HDB uucp to demonstrate the
problem.

Quote once and for all, with a consistent syntax.

> Both the tool and the user _know_ what the arguments to a given tool
> mean.  The shell _DOES_NOT_.  It is foolish, therefore, to have the
> shell do the globbing.  Period.

No, the tool does not always know what the arguments mean. Particularly when
the "user" is another program. Why should I have todepend on hundreds of
programmers writing hundreds of unique tools guessing what globbing is needed:
the USER should be the only entity specifying when globbing is going to be
done.

To paraphrase: it is foolish, therefore, to have each separate and unique tool
do the globbing. Period.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/16/91)

In article <MWM.91Mar14114845@raven.pa.dec.com> mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
>    I agree, that's a good idea. How do you get the programmers to glob in
>    the places you want in the first place?

> You don't. Then again, shell globbing only solves one such case, and
> makes dealing with the rest that much harder.

It solves the 95% case, and doesn't make the 5% case much harder at all.

>    Even on systems that support globbing libraries, you run into this...
>    and unlike shell globbing you don't get the option of quoting them to
>    make them do what you want.

> Not true. A shell metacharacter that says "Please glob this argument
> for me" is possible.

Yes, you do that in tcl. "eval [concat ls [glob *.c]]".

And, of course, since the shell is doing the globbing you *can* implement
this in UNIX. You can't in AmigaOS, VMS, or other systems where the program
does the globbing.

Welcome to TclU 4.0
tcl> eval [concat ls [glob *.c]]
bozo.c
crashme.c
deproto.c
kill.c.l.c
tcl>

Hmmmm...

> % diff foo.{new,old}

But this *is* globbing. The implementation sucks, but it's similar to:

1> diff foo(new|old)
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

sef@kithrup.COM (Sean Eric Fagan) (03/18/91)

(Wrong newsgroup again!  I've posted this to c.o.m and redirected followups
there.  Again.  *sigh*)

In article <1406@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
>But, that doesn't mean that it can't be done to "Unix" as an
>environment.  Just that it has to be done co-operatively with the shell,
>not co-operatively with individual commands. 

Embos had a wonderful "feel" to it.  I really liked it.  Part of this was
accomplished by having an incredibly powerful parameterization interface,
which was (more or less) shared between both programming languages (well,
Pascal and C, at least 8-)) and the shell.  The shell had commands to set
types of parameters, and other qualifiers.  (Whether it was positional or
not, whether it was a file-list [in which case it got 'globbed'], etc.)

It would not be impossible to write an embos-like shell to unix, and then
create shell scripts using it.

Note, incidently, that, even though embos had the *ability* to do 'mv *.x
*.y', the rename utility did not do that, as it caused too many problems
(for people who came from unix, which was a large part of its intended
audience).

-- 
Sean Eric Fagan  | "I made the universe, but please don't blame me for it;
sef@kithrup.COM  |  I had a bellyache at the time."
-----------------+           -- The Turtle (Stephen King, _It_)
Any opinions expressed are my own, and generally unpopular with others.

jlg@cochiti.lanl.gov (Jim Giles) (03/19/91)

From article <WG0A148@xds13.ferranti.com>, by peter@ficc.ferranti.com
(Peter da Silva):
> In article <17602@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
>> Please read the original post again.  The mnemonic purpose of quoting
>> is to _prevent_ globbing.  But, in _both_ these cases, I _WANT_ globbing.
> 
> No, you want wildcarding in another context. That's not globbing.

Call it whatever you want.  The point is that quoting implies turning
it OFF!!!!  And I want it done!!!  And I don't want the shell to 
incorrectly do it (which is the only way the shell works - incorrectly).

>> [...]     _your_ "solution" requires multiple quoting - _BLETCH!!_
> 
> As in "grep 'include.*\.h' *.c"?  [...]

BLETCH!!!

> [...]                            But your solution requires knowing the
> behaviour of the program you're passing the argument to the other context.

If I don't know what a program expects as arguments, I have no business
calling it at all.  Of _course_ I have to know what the program is going
to do with its arguments.

> And when I'm calling it directly from another program (say, "mail") I
> have to add a bunch of quoting I'm not putting in now... [...]

Only in the _rare_ case when the arguments contain wildcard characters
that you don't want expanded.  This almost never happens, but when it
does, it is preferable to have a consistent quoting convention for such
instances.  Your "solution" doesn't - it requires multiple quoting,
perhaps to several levels if the argument is to be passed through
several programs.

> [...]                                                    and *that* is a
> security risk... one I've exploited in older HDB uucp to demonstrate the
> problem.

It is only a security risk in systems where security is shoddy to begin
with.  Oh, that's right, we _are_ talking about UNIX.

> [...]
> Quote once and for all, with a consistent syntax.

Exactly what I just said: DON'T quote multiply - quote once and for
all!!  Quoting should mean what people intuitively think it means:
the quoted argument will be used literally.  No matter how many levels
deep it's passed and no matter where it goes, the end use of the string
will use the exact literal contents of the string.

And consistent syntax is achieved by programmer discipline, not
monstrous shell design.

> [...]
> No, the tool does not always know what the arguments mean. Particularly when
> the "user" is another program. Why should I have todepend on hundreds of
> programmers writing hundreds of unique tools guessing what globbing is
needed:
> the USER should be the only entity specifying when globbing is going to be
> done.

Fine - then not even the shell should ever do it automatically.
If _anything_ does it automatically, it ought to be the tool and
NOT the shell.  The shell should keep its grubby hands off the entire
command line.  In fact, I would prefer that the shell not even
tokenize the command line.

However, this bankrupt argument about consistency is often used by UNIX
supporters to object to useful features that UNIX _doesn't_ have.  The
fact the UNIX contains _MORE_ than its share of features which are
inconsistent is evidence that it is not _really_ considered to be an
important issue.  So, you're just clutching at any argument which will
defend your religiously chosen system.  The bottom line is that if
consistency _is_ important, it is best achieved by discipline among
systems programmers - something that UNIX has _never_ seen.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (03/19/91)

In article <18205@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> > And when I'm calling it directly from another program (say, "mail") I
> > have to add a bunch of quoting I'm not putting in now... [...]

> Only in the _rare_ case when the arguments contain wildcard characters
> that you don't want expanded.

No, this is the common case, when you're passing a string to a program
that you got from somewhere else. You *have* to assume that it might be
expanded. It is a bug to do anything else.

> Your "solution" doesn't - it requires multiple quoting,
> perhaps to several levels if the argument is to be passed through
> several programs.

No, it requires assuming that the people who wrote the programs put in
the appropriate escapes. If they didn't, it's a bug.

> Exactly what I just said: DON'T quote multiply - quote once and for
> all!!

And that means, you quote in the shell.

> Quoting should mean what people intuitively think it means:
> the quoted argument will be used literally.  No matter how many levels
> deep it's passed and no matter where it goes, the end use of the string
> will use the exact literal contents of the string.

I see, so if you're writing a Fortran program you shouldn't have to quote
the file name in the following statement:

	OPEN(NAME='JGTEST.TXT',TYPE=UNKNOWN)

???

> And consistent syntax is achieved by programmer discipline, not
> monstrous shell design.

Why should the shell be different from any other programming language?

> fact the UNIX contains _MORE_ than its share of features which are
> inconsistent is evidence that it is not _really_ considered to be an
> important issue.  So, you're just clutching at any argument which will
> defend your religiously chosen system.

Ah, abuse.

No, my religiously chosen system is AmigaDOS. Which does the expansion in
the programs...
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (03/19/91)

In article <5H0ABE8@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
   >    Even on systems that support globbing libraries, you run into this...
   >    and unlike shell globbing you don't get the option of quoting them to
   >    make them do what you want.

   > Not true. A shell metacharacter that says "Please glob this argument
   > for me" is possible.

   Yes, you do that in tcl. "eval [concat ls [glob *.c]]".

   And, of course, since the shell is doing the globbing you *can* implement
   this in UNIX. You can't in AmigaOS, VMS, or other systems where the program
   does the globbing.

Sorry, but that's not true. I've been doing it on AmigaDOS.  It's
primitive, because the globbing tool is kludged, but it does work.
All it takes are tools designed for handling multiple file arguments.

   > % diff foo.{new,old}

   But this *is* globbing. The implementation sucks, but it's similar to:

   1> diff foo(new|old)

Maybe we should define globbing. The discussion seems to imply that
it's "regular expression matching against the local file system name
space," with specific arguments that matching against other parts of
the file system aren't globbing. A simple test shows that the file
system name space (local or not) isn't involved in this operation.

? ls foo.*
No match.
? echo foo.{new,old}
foo.new foo.old

That csh {,} mechanism is text manipulation; no pattern matching
involved at all. You can tell what the results of this operator are
going to be _without_ reference to the local file system (this isn't
true for the AmigaDOS version); this makes it a different beast.

	<mike
--
I went down to the hiring fair,				Mike Meyer
For to sell my labor.					mwm@pa.dec.com
I noticed a maid in the very next row,			decwrl!mwm
I hoped she'd be my neighbor.

peter@ficc.ferranti.com (Peter da Silva) (03/19/91)

In article <MWM.91Mar18160044@raven.pa.dec.com> mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
>    And, of course, since the shell is doing the globbing you *can* implement
>    this in UNIX. You can't in AmigaOS, VMS, or other systems where the program
>    does the globbing.

> Sorry, but that's not true. I've been doing it on AmigaDOS.  It's
> primitive, because the globbing tool is kludged, but it does work.

Sigh. Teach me to not qualify my comments, will you? Take that! And that!

Look, the point is that you can't do it without creating all sorts of
problems, because (as you noted)...

> All it takes are tools designed for handling multiple file arguments.

... it requires tools that take multiple arguments. On AmigaDOS, because
the tools are written to do the globbing themselves, they don't.

> Maybe we should define globbing. The discussion seems to imply that
> it's "regular expression matching against the local file system name
> space," with specific arguments that matching against other parts of
> the file system aren't globbing. A simple test shows that the file
> system name space (local or not) isn't involved in this operation.

Yes, the implementation sucks. Didn't I just say that?
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jlg@cochiti.lanl.gov (Jim Giles) (03/20/91)

In article <A23AFH9@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
|> In article <18205@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
|> [...]
|> > Only in the _rare_ case when the arguments contain wildcard characters
|> > that you don't want expanded.
|> 
|> No, this is the common case, when you're passing a string to a program
|> that you got from somewhere else. You *have* to assume that it might be
|> expanded. It is a bug to do anything else.

No, you don't.  If no tool but the ultimate consumer _ever_ expands
wildcards, then quoting need be used only if you have wildcard chars
that you don't want the ultimate consumer to expand.  No intermediate
tool, shell, processor, path, parser, or program has any business 
altering an argument that it's supposed to be just passing along.

|> [...]
|> > Exactly what I just said: DON'T quote multiply - quote once and for
|> > all!!
|> 
|> And that means, you quote in the shell.

No, it means that the shell doesn't do _anything_ to command line
arguments so that the quotes arrive at the  tool still intact.  If
the shell handles quoting, then complicated scripts have to count
how many times each argument will need to be escaped to prevent
being mangled.  I've seen scripts with 8 consecutive backslashes (\)
because the programmer wanted _one_ to be literally present in the
ultimate context of the argument - and it was only going through
2 intermediate commands.  I thought that was ample evidence of the
stupidity of shell oriented argument parsing until someone told
me of a similar program with 64 consecutive backslashes!!  He had 
only 5 intermediate commands (the ultimate, being the sixth, reduced
the last pair to a single backslash).  This is what I _DON'T_ want
to have to do.  I _DON'T_ want to have to multiply quote an argument
that should be simply passed unchanged through the command language
until it reaches its ultimate consumer tool.

|> [...]
|> > Quoting should mean what people intuitively think it means:
|> > the quoted argument will be used literally.  No matter how many levels
|> > deep it's passed and no matter where it goes, the end use of the string
|> > will use the exact literal contents of the string.
|> 
|> I see, so if you're writing a Fortran program you shouldn't have to quote
|> the file name in the following statement:
|> 
|> 	OPEN(NAME='JGTEST.TXT',TYPE=UNKNOWN)

Good example!  Note how the ultimate consumer of the string "JGTEST.TXT"
will get exactly that string - no matter how deep in the I/O library
this string is passed, it will never be globbed, sludged, mangled, or
trashed at any intermediate step.  Meanwhile, the variable UNKNOWN will 
be evaluated _once_ and not changed afterword either.  The proper context
for evaluating Fortran variables is in the routine to which they are
local - that is, in the caller as is done here.  The proper place to
evaluate wildcards in a command language is in the context of the 
ultimate consumer of the argument.

|> [...]
|> > And consistent syntax is achieved by programmer discipline, not
|> > monstrous shell design.
|> 
|> Why should the shell be different from any other programming language?

Because the shell has a different function and different problem domains.
Still, the difference between what I want and what other programming
languages do is trivial - I just pointed it out:  The arguments should
only be evaluated _once_ in either the command or the programming language.
The arguments should be evaluated in the context where the _meaning_
of the argument is known - by the _caller_ in a programming language,
by the ultimate destination in a command shell.  In _both_ cases, the 
argument passing mechanism doesn't know what the argument means and 
should not alter it _at_all_!  _YOUR_ recomendation is much further
from programming language design that mine.

J. Giles

peter@ficc.ferranti.com (Peter da Silva) (03/20/91)

In article <18365@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> No, you don't.  If no tool but the ultimate consumer _ever_ expands
> wildcards, then quoting need be used only if you have wildcard chars
> that you don't want the ultimate consumer to expand.

But you don't *know* that only the ultimate consumer is going to expand
wildcards. We're talking about random programs written by random people
at random times for random purposes with random levels of debugging. At
least in UNIX you know that anything you get has been expanded already,
and there is no reason to do so again.

What you're saying makes perfect sense in an ideal world, but that's what
I've been saying all along. In the real world, you have to assume that
the program you're passing stuff to might decide to glob it.

> I've seen scripts with 8 consecutive backslashes (\)
> because the programmer wanted _one_ to be literally present in the
> ultimate context of the argument - and it was only going through
> 2 intermediate commands.

Sounds like the programmer screwed up somewhere. I've never had to nest
more than two quotes. Of course using backslashes instead of quotes to
quote the argument is probably a mistake.

> |> 	OPEN(NAME='JGTEST.TXT',TYPE=UNKNOWN)

> Good example!  Note how the ultimate consumer of the string "JGTEST.TXT"
> will get exactly that string

But that's not the string he started with. He started with "'JGTEST.TXT'".
He quoted it at the top level (the language) and it then went all the way
down with no further quotes because none of the levels in the way did any
globbing. Your argument that tools should glob is like expecting OPEN to
glob.

> Still, the difference between what I want and what other programming
> languages do is trivial - I just pointed it out:  The arguments should
> only be evaluated _once_ in either the command or the programming language.

And they are. And they are evaluated in a context where the meaning of
the argument is known: by the caller. The shell is still a programming
language, however much you deny it.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (03/20/91)

In article <DS3ABVD@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
   Sigh. Teach me to not qualify my comments, will you? Take that! And that!

   Look, the point is that you can't do it without creating all sorts of
   problems, because (as you noted)...

   > All it takes are tools designed for handling multiple file arguments.

   ... it requires tools that take multiple arguments. On AmigaDOS, because
   the tools are written to do the globbing themselves, they don't.

Fixed in 2.0. In any case, it doesn't create "all sorts of problems".
Either the command accepts multiple arguments, in which case it works,
or the command doesn't, in which case you get an error. It doesn't
matter whether the shell did the globbing or some tool did it for the
shell; it only depends on the command.

   > Maybe we should define globbing. The discussion seems to imply that
   > it's "regular expression matching against the local file system name
   > space," with specific arguments that matching against other parts of
   > the file system aren't globbing. A simple test shows that the file
   > system name space (local or not) isn't involved in this operation.

   Yes, the implementation sucks. Didn't I just say that?

Um - I think we disagree about what "sucks" is. The AmigaDOS version
sucks - it depends on the contents file system space. The csh version
is done right - it lets you deal with arbitrary text. For example, the
"group rename" hack is (slightly) easier with csh:

	ls *.x | sed 's/\(.*\)x$/mv \1{x,y}/' | /bin/csh

Further, since doesn't depend on the filename space, I can use it in
different contexts - like machine names and process id's (places where
I do use it).

	<mike
--
Teddies good friend has his two o'clock feast		Mike Meyer
And he's making Teddies ex girl friend come		mwm@pa.dec.com
They mistook Teddies good trust				decwrl!mwm
Just for proof that Teddy was dumb.

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/21/91)

In article <B.3A_=8@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

   In article <18365@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
   > No, you don't.  If no tool but the ultimate consumer _ever_ expands
   > wildcards, then quoting need be used only if you have wildcard chars
   > that you don't want the ultimate consumer to expand.

   But you don't *know* that only the ultimate consumer is going to expand
   wildcards. We're talking about random programs written by random people
   at random times for random purposes with random levels of debugging. At
   least in UNIX you know that anything you get has been expanded already,
   and there is no reason to do so again.

   What you're saying makes perfect sense in an ideal world, but that's what
   I've been saying all along. In the real world, you have to assume that
   the program you're passing stuff to might decide to glob it.

So what you're saying is that it's better for programs not to glob, because
that way you can totally bypass the globbing mechanism if you want to.
That makes a lot of sense.  But it has a lot of limitations, too, as have
been well described in this discussion.

Unless we are just having a religious war about whether shell or program
globbing is better, let's be constructive.  What can be done, in both the
long and short term, to improve this situation?

It seems to me that we must consider shell globbing to be a tool somewhat
separate from the shell per se, which assists us in giving file names to
programs.  So here's my constructive thought:

The clobbing mechanism should be recognized as the preprocessor it is, and
unbundled from the shell.  It should made accessible, controllable, and
extendable like C preprocessor 'cpp' (...poor though that preprocessor is).
Globbing needs to be more controllable; perhaps such a mechanism might
help.

   > I've seen scripts with 8 consecutive backslashes (\)
   > because the programmer wanted _one_ to be literally present in the
   > ultimate context of the argument - and it was only going through
   > 2 intermediate commands.

   Sounds like the programmer screwed up somewhere. I've never had to nest
   more than two quotes. Of course using backslashes instead of quotes to
   quote the argument is probably a mistake.

I agree that quoting/backslashing can be a royal pain -- especially when
one is trying to define aliases.  However, I think that it is a broader
issue that globbing; rather, that it is a shell design issue which affects
the use of metacharacters in general, not just those used for globbing.
Once that is removed, the problem of when expansion occurs is greatly
reduced, although not eliminated.

   > |> 	OPEN(NAME='JGTEST.TXT',TYPE=UNKNOWN)

   > Good example!  Note how the ultimate consumer of the string "JGTEST.TXT"
   > will get exactly that string

   But that's not the string he started with. He started with "'JGTEST.TXT'".
   He quoted it at the top level (the language) and it then went all the way
   down with no further quotes because none of the levels in the way did any
   globbing. Your argument that tools should glob is like expecting OPEN to
   glob.

What's wrong with that?  As far as I'm concerned, the file system calls
_should_ be able to expand expressions, "~", variables, etc., the same way
thay handle soft links and (in VMS and AmigaDOS) logical names.

What seems to be missing in most environments is a well-planned syntax
which allows one to say clearly and unambiguously what one means -- to say
what is a wildcard expression, what is a command option, and what is
literal data -- and be sure both that a program gets exactly the command we
want it to get, and that it interprets that that command the way we want.

--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

peter@ficc.ferranti.com (Peter da Silva) (03/21/91)

In article <MWM.91Mar19164959@raven.pa.dec.com>, mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) writes:
> Either the command accepts multiple arguments, in which case it works,
> or the command doesn't, in which case you get an error.

Right. Or the command treats multiple arguments in an unexpected way, and
something bad happens.

Inconsistancy is a definite problem.

> Um - I think we disagree about what "sucks" is.

Yes, I mean "overloading globbing and text manipulation on a single tool".
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jlg@cochiti.lanl.gov (Jim Giles) (03/21/91)

In article <B.3A_=8@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
|> [...]
|> What you're saying makes perfect sense in an ideal world, but that's what
|> I've been saying all along. In the real world, you have to assume that
|> the program you're passing stuff to might decide to glob it.

If I don't know what a program is going to do with its arguments,
I ain't a-goin' to use the program!!  Period.

|> In article <18365@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
|> > [... lots of nested quotes or repeated escape symbols ...]
|> 
|> Sounds like the programmer screwed up somewhere. I've never had to nest
|> more than two quotes. [...]

Then, you've never sent arguments to shell scripts which in turn
invoke shell scripts, which in turn invoke shell scripts, ....  This
is a common practice on UNIX systems.  Often, systems 'programs' are
implemented just that way.

|> [...]
|> > |> 	OPEN(NAME='JGTEST.TXT',TYPE=UNKNOWN)
|> 
|> > Good example!  Note how the ultimate consumer of the string "JGTEST.TXT"
|> > will get exactly that string
|> 
|> But that's not the string he started with. He started with "'JGTEST.TXT'".

The string he started with was ->  JGTEST.TXT  <- with no markers on it
at all.  The apostrophes on in the open statement were there to clearly
denote that fact.  The apostrophes are _not_ part of the value - and
are only there to prevent the Fortran compiler from incorrectly trying
to evaluate the string as a variable name or expression.  As I said
before, Fortran does evaluation in the the most meaningful context - the
caller.  In a command language, the most meaningful context for argument
evaluation is in the recipient.  In _neither_ language should the 
parameter passing mechanism do the argument evaluation.

|> [...]
|> He quoted it at the top level (the language) and it then went all the way
|> down with no further quotes because none of the levels in the way did any
|> globbing. Your argument that tools should glob is like expecting OPEN to
|> glob.

A good place for it.  If _any_ component of the Fortran programming
environment were to be given the ability to match wildcards against
file names, the I/O library would be the correct place to do it. OPEN
doesn't glob because nothing in a standard Fortran environment does.

What _you_ are recommending is like expecting the LOADER to glob (or
insert globbing code) when it links procedures together.

|> [...]
|> > Still, the difference between what I want and what other programming
|> > languages do is trivial - I just pointed it out:  The arguments should
|> > only be evaluated _once_ in either the command or the programming
language.
|> 
|> And they are. And they are evaluated in a context where the meaning of
|> the argument is known: by the caller. [...]

They are evaluated by the _shell_ NOT the _caller_.  The _shell_ has
no business doing so.  The _shell_ does NOT do it only once, it globs
every time an argument gets passed to it - which in a UNIX environment
may be very often indeed.  The _shell_ hasn't got the slightest idea
what the argument means, but _assumes_ that it's a file name.  So, your
last sentence above is completely false.

|> [...]                                 The shell is still a programming
|> language, however much you deny it.

I didn't deny it.  But I will.  The shell scripting syntax and semantics
do indeed constitute a language.  But, the shell itself is an intermediary.
It is nothing more than a particularly poorly informed interpreter.  If
the shell scritping language allowed arguments to be given _types_, then
the problem would solve itself: the shell could glob (once) any argument
which had the data type <list-of-files>.  Any other argument it would
leave alone.  This would be a workable solution.  _BUT_ it would require
that all commands be declared to the shell so that it would know the
types of the command's arguments.  Various people in this discussion have
asserted that this is an unacceptable solution.*  The second best is to 
each command evaluate its own arguments - the command knows what they
mean.  The third best solution is to have the user explicitly evaluate
the arguments himself - he also knows what they mean (or he has no 
business using the command), but it's just a lot of unnecessary work
to force the user to glob manually.  The _worst_ solution is to have
the _shell_ do the argument evaluation _blindly_ - which is what you
are advocating.

J. Giles

Footnote[*]:  I have no idea why people think a shell which required 
all command names to be declared is unacceptable.  It is standard CS
dogma that all things must be declared in a language.  Yet this is
resisted in the command language.  These days, it's even common for
languages to require declarations of external procedures (right down
to the types of the arguments) - this is just the sort of model for
command languages I have described above.

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/21/91)

In article <18511@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> In article <B.3A_=8@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
> da Silva) writes:
> |> What you're saying makes perfect sense in an ideal world, but that's what
> |> I've been saying all along. In the real world, you have to assume that
> |> the program you're passing stuff to might decide to glob it.
> If I don't know what a program is going to do with its arguments,
> I ain't a-goin' to use the program!!  Period.

The problem is that there are lots of programs (from ``nice'' and
``find'' on up) that are documented to pass some of their arguments to
other programs. Do you expect them to whip out the man pages for those
other programs and figure out whether they should quote or glob or what?
Perhaps you also want these programs to ftp the documentation if they
can't find it locally. And fix your coding mistakes. And fetch your
newspaper in the morning.

> |> In article <18365@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> |> > [... lots of nested quotes or repeated escape symbols ...]
> |> Sounds like the programmer screwed up somewhere. I've never had to nest
> |> more than two quotes. [...]
> Then, you've never sent arguments to shell scripts which in turn
> invoke shell scripts, which in turn invoke shell scripts,

``Peter, didn't you know that in an ideal world, there would only be ONE
level of parsing for EVERYTHING?'' ``Wow, Jim, really?'' ``That's right,
Peter, there's just ONE parser that handles key-up and key-down signals
from your keyboard, tty handling inside your computer, command
execution, argument processing, globbing, editing modes, compiling, and
screen output formatting!'' ``But, Jim, that seems a bit monolithic.
Wouldn't it be better to have each logically separate parser isolated
inside a program of its own, the way that globbing is isolated inside
the shell?'' ``No, Peter, NO! That is NOT how the world works! I don't
want there to be multiple parsing levels, even when that makes perfect
sense!'' ``But, Jim, if you want just one parsing level, then why are
you saying that the parsing should be handled by every single separate
program, instead of by just one program the way that globbing is done by
the shell?'' ``Uh, um. Uh, um. Um. Uh. Um.''

> ....  This
> is a common practice on UNIX systems.  Often, systems 'programs' are
> implemented just that way.

Good scripts do not glob or otherwise parse their arguments any more
than good programs do. Just because you haven't used UNIX long enough to
know how to use "$@" (really {1+"$@"}---the price you pay for backward
compatibility) is no reason for you to shout religiously that the system
is flawed.

> In a command language, the most meaningful context for argument
> evaluation is in the recipient.  In _neither_ language should the 
> parameter passing mechanism do the argument evaluation.

Jim is somehow saying that the shell's globbing is part of its
``parameter passing mechanism.'' That distinction is indefensible.

> They are evaluated by the _shell_ NOT the _caller_.  The _shell_ has
> no business doing so.  The _shell_ does NOT do it only once, it globs
> every time an argument gets passed to it

You are confused. The user has one shell, which does one level of
globbing (which he can easily turn off). He may invoke programs which
happen to be scripts written in sh or csh; if they glob their arguments
(except those documented to be filename patterns), they are FLAWED, just
like C programs that glob their arguments.

> - which in a UNIX environment
> may be very often indeed.

So your colleagues write buggy shell scripts and tell you about them.
Big deal.

> |> [...]                                 The shell is still a programming
> |> language, however much you deny it.
> I didn't deny it.  But I will.  The shell scripting syntax and semantics
> do indeed constitute a language.  But, the shell itself is an intermediary.
> It is nothing more than a particularly poorly informed interpreter.  If
> the shell scritping language allowed arguments to be given _types_, then
> the problem would solve itself: the shell could glob (once) any argument
> which had the data type <list-of-files>.  Any other argument it would
> leave alone.  This would be a workable solution.

Aha! So, after all this complaining, you admit that globbing should be
in the shell anyway. You just want a shell with certain data types. Why
didn't you say so in the first place?

---Dan

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/21/91)

In article <KENW.91Mar20162655@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
> So what you're saying is that it's better for programs not to glob, because
> that way you can totally bypass the globbing mechanism if you want to.
> That makes a lot of sense.  But it has a lot of limitations, too, as have
> been well described in this discussion.

No, the limitations have *not* been well described. People repeatedly
give the example of renaming multiple files; once they learn that
mvm "*.c" "=1.c.bak" does the job, they shut up.

> What seems to be missing in most environments is a well-planned syntax
> which allows one to say clearly and unambiguously what one means -- to say
> what is a wildcard expression, what is a command option, and what is
> literal data -- and be sure both that a program gets exactly the command we
> want it to get, and that it interprets that that command the way we want.

/*/*.c is a wildcard expression. -x is a command option. "$var" is
literal data.

---Dan

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/21/91)

In article <18511@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:

   ...
   I didn't deny it.  But I will.  The shell scripting syntax and semantics
   do indeed constitute a language.  But, the shell itself is an intermediary.
   It is nothing more than a particularly poorly informed interpreter.  If
   the shell scritping language allowed arguments to be given _types_, then
   the problem would solve itself: the shell could glob (once) any argument
   which had the data type <list-of-files>.  Any other argument it would
   leave alone.  This would be a workable solution.  _BUT_ it would require
   that all commands be declared to the shell so that it would know the
   types of the command's arguments.  Various people in this discussion have
   asserted that this is an unacceptable solution.*  The second best is to 
   each command evaluate its own arguments - the command knows what they
   mean.  The third best solution is to have the user explicitly evaluate
   the arguments himself - he also knows what they mean (or he has no 
   business using the command), but it's just a lot of unnecessary work
   to force the user to glob manually.  The _worst_ solution is to have
   the _shell_ do the argument evaluation _blindly_ - which is what you
   are advocating.

Nice summary. 

   J. Giles

   Footnote[*]:  I have no idea why people think a shell which required 
                                                               ^^^^^^^^
   all command names to be declared is unacceptable.  It is standard CS
   dogma that all things must be declared in a language.  Yet this is
   resisted in the command language.  These days, it's even common for
   languages to require declarations of external procedures (right down
   to the types of the arguments) - this is just the sort of model for
   command languages I have described above.

I think there have been some reasonable concerns expressed about this
approach, although many of them show lack vision or familiarity with
working examples.

There's a compromise solution which might be more acceptable.  If we
changed that word "required" to "allowed", it would be a lot more practical
to implement in current environments.  In would still, however, be subject
to other limitations of shell syntax.

--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

peter@ficc.ferranti.com (Peter da Silva) (03/21/91)

In article <18511@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> |> Sounds like the programmer screwed up somewhere. I've never had to nest
> |> more than two quotes. [...]

> Then, you've never sent arguments to shell scripts which in turn
> invoke shell scripts, which in turn invoke shell scripts, ....  This
> is a common practice on UNIX systems.  Often, systems 'programs' are
> implemented just that way.

Sure I have.

But one nice thing about shell scripts: you can fix the bloody things when
they have bugs in them. And not properly quoting variables is a bug. This
is just as bad as a program in VMS or AmigaDOS not globbing when it should,
but with a major difference: it's fixable.

> |> But that's not the string he started with. He started with "'JGTEST.TXT'".

> The string he started with was ->  JGTEST.TXT  <- with no markers on it
> at all.

So how does this differ from:

find . -name '*.c' ....

> The apostrophes on in the open statement were there to clearly
> denote that fact.  The apostrophes are _not_ part of the value - and
> are only there to prevent the Fortran compiler from incorrectly trying
> to evaluate the string as a variable name or expression.

The apostrophes in the find command were there to clearly denote that
the string is "*.c", and are only there to prevent the shell from
incorrectly trying to evaluate the string as a filename.

> As I said
> before, Fortran does evaluation in the the most meaningful context - the
> caller.

As I said before, the Shell does evaluation in the most meaningful
context - the caller.

> In a command language, the most meaningful context for argument
> evaluation is in the recipient.

In a command language, the most meaningful context for argument evaluation
is the user, but in the absense of AI the caller is the best remaining
choice.

> In _neither_ language should the 
> parameter passing mechanism do the argument evaluation.

In neither language does the parameter passing mechanism do the argument
evaluation.

> They are evaluated by the _shell_ NOT the _caller_.

The shell is the caller.

> The _shell_ does NOT do it only once, it globs
> every time an argument gets passed to it - which in a UNIX environment
> may be very often indeed.

An argument is passed to the shell only once. If the arguments are ever
passed unquoted that is a bug in the script, just as much as any other
bug in any language... with the difference that you can fix it.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/22/91)

In article <KENW.91Mar20162655@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
> So what you're saying is that it's better for programs not to glob, because
> that way you can totally bypass the globbing mechanism if you want to.

Exactly.

> That makes a lot of sense.  But it has a lot of limitations, too, as have
> been well described in this discussion.

It means that you have to be able to tell the shell not to glob, for the
relatively uncommon case where file globbing is not what you want.

> It seems to me that we must consider shell globbing to be a tool somewhat
> separate from the shell per se, which assists us in giving file names to
> programs.

A standard library routine for performing globbing is a must, but I don't
see that it changes the arguments in favor of doing shell globbing.

> I agree that quoting/backslashing can be a royal pain -- especially when
> one is trying to define aliases.  However, I think that it is a broader
> issue that globbing; rather, that it is a shell design issue which affects
> the use of metacharacters in general, not just those used for globbing.

Agreed. Globbing is a side issue: it changes the magnitude of the "problem",
but it hasn't created it.

> Once that is removed, the problem of when expansion occurs is greatly
> reduced, although not eliminated.

Once what is removed? The use of metacharacters in shell syntax? I don't
think you can do away with that without abandoning the idea of having
the shell as a programming language altogether. I am not prepared to do
that.

You can always abandon shell programming, and have (as you say) a separate
preprocessor that does the preprocessing and calls the "real shell" to do
the work, but I would say the resulting pair of programs will continue to
be used as the shell. The "new shell" will be little but a loop calling
execv... and you already have that tool available.

> What's wrong with that?  As far as I'm concerned, the file system calls
> _should_ be able to expand expressions, "~", variables, etc., the same way
> thay handle soft links and (in VMS and AmigaDOS) logical names.

But the semantics of file system calls are different. "Open" returns a single
file token (handle, LUN, whatever). Which of the 47 matching files foes it
open for "*.c".

I've used a system (on CP/M) where the runtime did this very thing, and
the resulting program behaviour was confusing to say the least.

As for a complete syntax for command arguments, see "parseargs", recently
posted to comp.sources.misc.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

mwm@pa.dec.com (Mike (My Watch Has Windows) Meyer) (03/22/91)

In article <QN4AH6A@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
   > Um - I think we disagree about what "sucks" is.

   Yes, I mean "overloading globbing and text manipulation on a single tool".

Yet another reason the shell shouldn't glob. After all, manipulating
text is a shells primary function.

	<mike
--
The road is full of dangerous curves			Mike Meyer
And we don't want to go too fast			mwm@pa.dec.com
We may not make it first				decwrl!mwm
But I know we're going to make it last.

new@ee.udel.edu (Darren New) (03/22/91)

In article <WG5ABTG@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>But one nice thing about shell scripts: you can fix the bloody things when
>they have bugs in them. And not properly quoting variables is a bug. This
>is just as bad as a program in VMS or AmigaDOS not globbing when it should,
>but with a major difference: it's fixable.

Not really. If your program only handles one argument at a time then
having the shell expand that argument is a bug you cannot fix. Most
UNIX commands don't do this only because they know the shell globs.

The only reason that shell globbing is fixable and command globbing
isn't is because the shell is an interpreter and commands are usually
compiled (or they would be fixable also).  I don't think saying "the
shell is interpreted and therefore you always have source" has anything
to do with the correctness of shell globbing.

	 -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, 
      Formal Description Techniques (esp. Estelle), Coffee, Amigas -----
  +=+=+ My time is very valuable, but unfortunately only to me +=+=+

peter@ficc.ferranti.com (Peter da Silva) (03/23/91)

As an excersize I am in the process of writing a shell for UNIX that only
performs globbing when requested. It is not going to be anything of the
complexity of the regular UNIX shells... sort of a baby shell for novices.
So far the total length of the shell is 472 lines, and it already parses
statements and executes them. No globbing is as yet implemented.

+ echo 'The only quoting character is single quotes'
The only quoting character is single quotes
+ echo 'Unclosed quotes are automatically and silently closed
Unclosed quotes are automatically and silently closed
+ echo 'You continue a line\
- by escaping it with a backslash'
You continue a line
by escaping it with a backslash
+ echo 'You quote a quote by doubling it'''
You quote a quote by doubling it'

And...

+ echo 'I''m planning on doing globbing like so: [*.c]'
I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

Any suggestions? I was thinking of continuing lines if there were unclosed
quotes, but that has proven a source of confusion on UNIX with the bourne
shell.

And, of course, any interest?
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

jesup@cbmvax.commodore.com (Randell Jesup) (03/23/91)

In article <DS3ABVD@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>> All it takes are tools designed for handling multiple file arguments.
>
>... it requires tools that take multiple arguments. On AmigaDOS, because
>the tools are written to do the globbing themselves, they don't.

	Ah, but under 2.0 most of them do, just as they accept wildcards
now.  For example, even MakeDir accepts multiple arguments (though wildcard-
expanding the arguments to MakeDir would be rather silly...).  There are one
or two exceptions; most should go away under 2.1.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

jesup@cbmvax.commodore.com (Randell Jesup) (03/23/91)

In article <B.3A_=8@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>But you don't *know* that only the ultimate consumer is going to expand
>wildcards. We're talking about random programs written by random people
>at random times for random purposes with random levels of debugging. At
>least in UNIX you know that anything you get has been expanded already,
>and there is no reason to do so again.

	If people break the rules, then the rules are broken, and bad things
result (at least inconsistency).  The same thing happens in Unix, though it's
often ignored.  If you're designing a system (not Unix) you can design it
such that when a wildcarded argument is expanded/processed, all the expanded
results are quoted such that the expander won't expand them again.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
The compiler runs
Like a swift-flowing river
I wait in silence.  (From "The Zen of Programming")  ;-)

jlg@cochiti.lanl.gov (Jim Giles) (03/24/91)

In article <TB6AKE6@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
|> [...]
|> + echo 'I''m planning on doing globbing like so: [*.c]'
|> I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

Shouldn't that be:

+ echo 'I''m planning on doing globbing like so: '[*.c]
I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

The globbing syntax _within_ the apostrophes should be passed
through unchanged - that's what the apostrophes mean (at least,
that's what they intuitively mean to the user).

|> [...]
|> And, of course, any interest?

Go for it.  It's better than what currently available UNIX shells
do now.

J. Giles

jlg@cochiti.lanl.gov (Jim Giles) (03/24/91)

In article <TB6AKE6@xds13.ferranti.com>, peter@ficc.ferranti.com (Peter
da Silva) writes:
|> [...]
|> + echo 'I''m planning on doing globbing like so: [*.c]'
|> I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

Whoops!  Forgot to mention that this syntax (even if you adopt my
previous suggestion) still doesn't allow me to write a tool which
uses the _same_ wildcard syntax for other tools that the shell
uses for filenames.  At the very least, the characters around
the wildcard pattern (or some other distinguishing feature) would
have to be different.  If consistency in the user interface were
_really_ important to you, you wouldn't force this inconsistency
into your shell design.

J. Giles

jesup@cbmvax.commodore.com (Randell Jesup) (03/25/91)

In article <KENW.91Mar21140655@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:

[ proposal that shells only deal with commands they know the syntax of ]

>There's a compromise solution which might be more acceptable.  If we
>changed that word "required" to "allowed", it would be a lot more practical
>to implement in current environments.  In would still, however, be subject
>to other limitations of shell syntax.

	There are 4 main classes of shell/globbing interaction:  1) shell
globs unless argument is quoted or metacharacter is escaped; 2) shell
doesn't glob unless told to via a metacharacter; 3) either 1 or 2 plus
the ability to tell the shell the syntax of a command and override the
default behavior of 1 or 2; 4) shell does no globbing.  (Unix is type 1
under existing shells, AmigaDos default shell is type 4, though some 3rd-
party shells have implemented type 1 and type 3.)

	I have written shells of types 1, 3 (over 1 & 2) and 4.  All solutions
except 1 more-or-less require programs do globbing (though some may not
require it of all programs, since all solutions except 4 provide a way to
get globbing).  Personally, my current leaning is for 2, though if you're
implementing it in a existing system (particularily one that has been type 1)
you'll probably need to use 3 to make things livable.

	BTW, I feel that ALL shells (1-4) should allow interactive completion
and expansion of arguments ala tcsh (tcsh completion plus the ability to tell
it to insert all matches).  It can be quite handy to see the results of
a globbing before it actually runs the command in some cases.

-- 
Randell Jesup, Keeper of AmigaDos, Commodore Engineering.
{uunet|rutgers}!cbmvax!jesup, jesup@cbmvax.commodore.com  BIX: rjesup  
Thus spake the Master Ninjei: "To program a million-line operating system
is easy, to change a man's temperament is more difficult."
(From "The Zen of Programming")  ;-)

peter@ficc.ferranti.com (Peter da Silva) (03/26/91)

In article <20056@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes:
> In article <DS3ABVD@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
> >> All it takes are tools designed for handling multiple file arguments.

> >... it requires tools that take multiple arguments. On AmigaDOS, because
> >the tools are written to do the globbing themselves, they don't.

> Ah, but under 2.0 most of them do, just as they accept wildcards now.

OK, so after 5 (6?) years we can expect the standard tools will finally
work right. Of course the short command line length limits are still there
(right?), and all the third-party tools are still waiting...

Nope, I stand by my assertion that you really can't expect to change the
shell semantics on systems where things like wildcarding are handled by
the applications.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/26/91)

In article <20057@cbmvax.commodore.com> jesup@cbmvax.commodore.com (Randell Jesup) writes:
> 	If people break the rules, then the rules are broken, and bad things
> result (at least inconsistency).  The same thing happens in Unix, though it's
> often ignored.

Or fixed, since it's usually in a shell script.

> If you're designing a system (not Unix) you can design it
> such that when a wildcarded argument is expanded/processed, all the expanded
> results are quoted such that the expander won't expand them again.

Can't be done. Both Lattice and Aztec compilers strip quotes from the
command line before passing argv/argc to the program, so when it runs it
will expand them again willy-nilly.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/26/91)

In article <18920@lanl.gov> jlg@cochiti.lanl.gov (Jim Giles) writes:
> |> + echo 'I''m planning on doing globbing like so: [*.c]'
> |> I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

> Shouldn't that be:

> + echo 'I''m planning on doing globbing like so: '[*.c]
> I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

No, those two have different meanings. The former passes a single argument
to echo:

I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

The latter passes 4 arguments:

I'm planning on doing globbing like so: split.c
getline.c
bsh.c
domagic.c

The only thing quotes are for in this shell is for argument grouping.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

kenw@skyler.arc.ab.ca (Ken Wallewein) (03/26/91)

In article <TB6AKE6@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:

   As an excersize I am in the process of writing a shell for UNIX that only
   performs globbing when requested. It is not going to be anything of the
   complexity of the regular UNIX shells... sort of a baby shell for novices.
   So far the total length of the shell is 472 lines, and it already parses
   statements and executes them. No globbing is as yet implemented.

   + echo 'The only quoting character is single quotes'
   The only quoting character is single quotes

Ok.

   + echo 'Unclosed quotes are automatically and silently closed
   Unclosed quotes are automatically and silently closed

Good.

   + echo 'You continue a line\
   - by escaping it with a backslash'
   You continue a line
   by escaping it with a backslash

Good -- except how to continue without including newline?  Here's a
suggestion: allow escaping _anything_.  It bugs me that csh doesn't respect
escaping quotes or spaces.

   + echo 'You quote a quote by doubling it'''
   You quote a quote by doubling it'

Good.

   And...

   + echo 'I''m planning on doing globbing like so: [*.c]'
   I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

  I don't think I like the use of [] though.  No major reasons, just little
things like it uses two characters instead of one, and isn't
space-terminated.  I'd prefer a sort of reverse-escape approach that worked
on a space-delimited string instead of a character, e.g. +*.o*, where "+"
is the reverse escape.

   Any suggestions? I was thinking of continuing lines if there were unclosed
   quotes, but that has proven a source of confusion on UNIX with the bourne
   shell.

Agreed.

I'd like to see some thought given to how you would handle the result of
globbing on a directory with file names containing "*", leading "-",
spaces, etc... 

  Explicit globbing would take care of "*"; maybe the globber should escape
"-", spaces, etc., automatically?  What would happen if this were passed
through multiple handlers?

  Ambiguity can be powerful, but sometimes it is dangerous.  Somethimes we
need to be able to say "THAT is PRECISELY what I mean".  It seems to be
rather difficult in most shells that support globbing.

   And, of course, any interest?

Yes!  I'd like a copy when/if it's available.

   -- 
   Peter da Silva.  `-_-'  peter@ferranti.com
   +1 713 274 5180.  'U`  "Have you hugged your wolf today?"

--
/kenw

Ken Wallewein                                                     A L B E R T A
kenw@noah.arc.ab.ca  <-- replies (if mailed) here, please       R E S E A R C H
(403)297-2660                                                     C O U N C I L

fetter@cos.com (Bob Fetter) (03/26/91)

In article <TB6AKE6@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>As an excersize I am in the process of writing a shell for UNIX that only
>performs globbing when requested. It is not going to be anything of the
>complexity of the regular UNIX shells... sort of a baby shell for novices.
>So far the total length of the shell is 472 lines, and it already parses
>statements and executes them. No globbing is as yet implemented.
>
>+ echo 'The only quoting character is single quotes'
>The only quoting character is single quotes
>+ echo 'Unclosed quotes are automatically and silently closed
>Unclosed quotes are automatically and silently closed
>+ echo 'You continue a line\
>- by escaping it with a backslash'
>You continue a line
>by escaping it with a backslash
>+ echo 'You quote a quote by doubling it'''
>You quote a quote by doubling it'
>    And...
>+ echo 'I''m planning on doing globbing like so: [*.c]'
>I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c
>
>Any suggestions? I was thinking of continuing lines if there were unclosed
>quotes, but that has proven a source of confusion on UNIX with the bourne
>shell.
>  And, of course, any interest?


  Clean and simple.  To continue in this model, might I suggest that
you use the [ ... ] to enclose *any* command, the results of which are
replaced in the command line (a-la ` ... ` in other shells).


  This would then have one employ the old glob(1) command to expand
wildcards.


  And, as far as (human) simplicity goes, well, many folks have the
impression that all of the characters ' (single quote), " (double
quote) and ` (grave) *all* are "quoting characters".  Why not allow
them all, with the only rule being that they are paired?  BTW--with
the [..] change and extending quotation characters to all three, it
would be effectively a remake of the Multics command_processor_ parser
logic.  So, quoting a quote, say, would be:

	"	'"'
	'	"'"
	`	'`'
or other combinations.

  You might consider, if you haven't already, constructs like:

	my_command argument1" with more" argument2 "argument "three

  argv[]
	"argument1 with more"
	"argument2"
	"argument three"

  Forms like this I've used on other shells (Multics, some Honeywell
minis, IBM, etc.), and is really (to me) intuitive and usefull in
scripts with argument/parameter substitution, etc.


  BTW - this parsing, with the slight (?) changes/suggestions above,
effectively duplicate the Multics command_processor_ parsing logic
(Minus the command iteration construct using parenthesis).


  Um, with the small size and all, does that include support for
multiple commands and piping?  If so, it sounds like tight code!


  Yeah, I'm interested in this... it might make an interesting net-project
to find out what a collective effort/discourse would result in.  Maybe,
after all the flames and retoric about "simple is best", maybe this would
result in a minimalist shell with intellegent per-user extension capabilities
(something I've mulled on lately, with dynamic linking finally becoming more 
pervasive).

  Just my $.02 here.

  -Bob-

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (03/28/91)

In article <KENW.91Mar25203803@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>   Explicit globbing would take care of "*"; maybe the globber should escape
> "-", spaces, etc., automatically?  What would happen if this were passed
> through multiple handlers?

You want globbing to put all relative filenames into ./ format? Fine.

>   Ambiguity can be powerful, but sometimes it is dangerous.  Somethimes we
> need to be able to say "THAT is PRECISELY what I mean".  It seems to be
> rather difficult in most shells that support globbing.

Oh? Do you have an example of where you haven't been able to tell the
shell precisely what you mean because it supports globbing?

---Dan

peter@ficc.ferranti.com (Peter da Silva) (03/28/91)

In article <KENW.91Mar25203803@skyler.arc.ab.ca> kenw@skyler.arc.ab.ca (Ken Wallewein) writes:
>    + echo 'You continue a line\
>    - by escaping it with a backslash'
>    You continue a line
>    by escaping it with a backslash

> Good -- except how to continue without including newline?

+ echo Unquoted newlines\
- are simply whitespace.
Unquoted newlines are simply whitespace.

> Here's a
> suggestion: allow escaping _anything_.  It bugs me that csh doesn't respect
> escaping quotes or spaces.

% ls a\ b
a b: No such file or directory
% echo \"\'
"'

I'm not sure this is a good idea, as it does combine two mechanisms. I'd
rather avoid conflicts between quoting and escaping.

>    + echo 'I''m planning on doing globbing like so: [*.c]'
>    I'm planning on doing globbing like so: split.c getline.c bsh.c domagic.c

>   I don't think I like the use of [] though.  No major reasons, just little
> things like it uses two characters instead of one, and isn't
> space-terminated.  I'd prefer a sort of reverse-escape approach that worked
> on a space-delimited string instead of a character, e.g. +*.o*, where "+"
> is the reverse escape.

But that uses two characters too (+ and space) and prevents you from globbing
unambiguously in some contexts.

> I'd like to see some thought given to how you would handle the result of
> globbing on a directory with file names containing "*", leading "-",
> spaces, etc... 

As for special handling of -, I think not. Of course, I could glob to "./...".
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (03/28/91)

In article <44381@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>   Um, with the small size and all, does that include support for
> multiple commands and piping?  If so, it sounds like tight code!

Nope. No backgrounding either.

I like the quote nesting business, and the [glob *.c] stuff. I could
make glob a builtin for efficiency, too.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

fetter@cos.com (Bob Fetter) (03/30/91)

In article <ESAANQ5@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>In article <44381@cos.com> fetter@cos.UUCP (Bob Fetter) writes:
>>   Um, with the small size and all, does that include support for
>> multiple commands and piping?  If so, it sounds like tight code!
>
>Nope. No backgrounding either.
>
>I like the quote nesting business, and the [glob *.c] stuff. I could
>make glob a builtin for efficiency, too.
>-- 
>Peter da Silva.  `-_-'  peter@ferranti.com
>+1 713 274 5180.  'U`  "Have you hugged your wolf today?"


  Well, one thing I've always liked from other systems is inline iteration.
This is a simple, yet powerful construct: one that seems both simpler
yet more powerful than equivalent looping/cycling constructs I've seen
in Unix shells.  (I'm sure there's powerful stuff out there... it's just
that I've not yet seen it all.)

  Iteration uses a list, this list contained within parenthesis.  For each
member of the list, the construct within which the list is embedded is
iterated upon and invoked.

  To take as an example, using the [..] syntax:

     touch ([glob *]).backup

will, once [glob *] is evaluated, say into: file1.c file1.o makefile readme

result in the execution of

     touch file1.c.backup
     touch file1.o.backup
     touch makefile.backup
     touch readme.backup

  Or, as another example, lets say wanting to put a single file in multiple
archives, all named with a '.archive' suffix:

     ar a ([glob *.archive]) the_file

  Of course, variable expansion should be supported:

     set MYLIST="fred joe bill wally"
     mail ($MYLIST) -f mailfile

		  <silly example, but used as an example).


  There are a few other rules on this form I've seen used, for
example, on Multics, where paired iteration lists would cycle in
tandem and where nesting of lists would be supported.

  But, I think I've presented the basic idea.  It would provide the
equivalent csh "foreach FOO () ... end" form as an instream syntatic
construct.  The examples I've given are quick, off-the-cuff.  More
powerful uses exist.

  One of the better uses of this form is to use 'globbing' against
commands which are constructed so as to only take *ONE* argument in a
particular syntatic position.  It repeats the command as required,
keeping the user happy and keeping the command from complaining.  Of
course, the list isn't limited to just 'globbing' expansion, but
could itself contain control arguments/flags or actual command names:
the use of a list of command names resulting in multiple commands
being run with identical argument lists.

=====

  OH, one example of paired lists *does* come to mind (quickly, he
said, before he submitted the article):

     diff ([glob *.c]) ([glob *.c.orig])

which, if we had a directory of .c files against which patch has been
run and we wanted to (for whatever reason) look at the results, would
do it for us.
  Lets say we had:
	file1.c
	file1.c.orig
	file2.c
	file2.c.orig
	file3.c
	file3.orig
	file4.c
	file4.c.orig

running the above diff command would, after [glob ..] processing, result
with:

	diff (file1.c file2.c file3.c file4.c) (file1.c.orig file2.c.orig file3.c.orig file4.c.orig)

which would cause the execution of:

	diff file1.c file1.c.orig
	diff file2.c file2.c.orig
	diff file3.c file3.c.orig
	diff file4.c file4.c.orig

=====

  Thoughts?  I hope this isn't too baroque, 'cause its really a
pleasure to use when its available.

  -Bob-

schwartz@groucho.cs.psu.edu (Scott Schwartz) (03/30/91)

In article <44431@cos.com> fetter@cos.com (Bob Fetter) writes:
|   Well, one thing I've always liked from other systems is inline iteration.

It interacts badly with command redirection.  I had many unhappy
experiences with Primos for just that reason.

fetter@cos.com (Bob Fetter) (04/02/91)

In article <=baGj#dc1@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott Schwartz) writes:
>
>In article <44431@cos.com> fetter@cos.com (Bob Fetter) writes:
>|   Well, one thing I've always liked from other systems is inline iteration.
>
>It interacts badly with command redirection.  I had many unhappy
>experiences with Primos for just that reason.


  What was Primos doing?  For example, with something like

   diff ([glob *.c]) ([glob *.c.orig]) >diffs_list

would it reopen diffs_list for each iterated command, truncating earlier
output?  If so, then this helps define the evaluation precedence into
something like:
	1 - handle redirection/piping for command, having it apply
	    across any iterated 'derived commands';
	2 - execute [...] commands, replacing with returned results;
	3 - iterate, using redirection/piping from 1.

  Step 1 would have the redirection done up front, with the commands
exec-ed using the redirection (like a subshell).

  Am I on the right track with what was happening with Primos that
was upsetting you?  Are there other things (good/bad) from that
environment worthy of discussion?

  -Bob-

schwartz@groucho.cs.psu.edu (Scott Schwartz) (04/02/91)

In article <44480@cos.com> fetter@cos.com (Bob Fetter) writes:

|   What was Primos doing? ...  Am I on the right track with what was
| happening with Primos that was upsetting you?

Yes, although I should clarify that Primos' command processor doesn't
do redirection itself.  I was doing that by hand inside each command.
So each time the command ran it truncated the earlier output.
Arguably that was simply a bug.  One fix would arrange to write in
append mode, but that's ugly in the cases when you do want to
overwrite.  Fixing the shell to do the redirection and keep the files
open between iterations might have worked, assuming I'd had sources
and was allowed to do anything with them. :-) Under Unix, of course,
the whole thing is much easier.

| Are there other things (good/bad) from that environment worthy of
| discussion?

More than I can count.  :-)

One nasty thing was that io to ttys was different than to files.
(Different system calls, from what I remember.) That made the whole
redirection idea much stickier.  Multics, apparently, had some way to
do redirection ("utterly trivial", cf DMR, 1984), so it might actually
have been possible in Primos.  But any details of such things were
safely hidden from anyone who would be interesting in knowing about
them.

A nicer thing was that (in some cases) you could mark executables so
that globbing/command iteration would not be done on them.

--
sparc ld: triple word score

blarson@blars (04/03/91)

In article <=baGj#dc1@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott Schwartz) writes:
>
>In article <44431@cos.com> fetter@cos.com (Bob Fetter) writes:
>|   Well, one thing I've always liked from other systems is inline iteration.
>
>It interacts badly with command redirection.  I had many unhappy
>experiences with Primos for just that reason.

Primos has pleany of problems with command redirection (like all primos
i/o, it's all special cases and no generalization) but I don't see how
any of them relate to command line iteration.  (I'm not a newcomer to
Primos, I've used 18.2 and 23.0 beta and most releases inbetween.)

-- 
blarson@usc.edu
		C news and rn for os9/68k!
-- 
Bob Larson (blars)	blarson@usc.edu			usc!blarson
	Hiding differences does not make them go away.
	Accepting differences makes them unimportant.

blarson@blars (04/03/91)

In article <?55G-t_e1@cs.psu.edu> schwartz@groucho.cs.psu.edu (Scott Schwartz) writes:
>Yes, although I should clarify that Primos' command processor doesn't
>do redirection itself.

Untrue, it can redirect input from a file (comi command) and redirect
output to a file (in adition to or instead of the terminal) (como command).

>  I was doing that by hand inside each command.

And didn't like how your own code worked.  Not something to blame on Primos.

>| Are there other things (good/bad) from that environment worthy of
>| discussion?

Yes.

>One nasty thing was that io to ttys was different than to files.

and from tape, printer, asigned async lines, sync lines, card readers/punches,
etc.  I/O under Primos is terrible from a programmers perspective.
(and like segments on an 8086, the mess is ususally visable from the
users seat as well.)

>(Different system calls, from what I remember.)
Yup.  Complete different calling conventions, too.  Not simply a
matter of sending different calls similar arguments.

>A nicer thing was that (in some cases) you could mark executables so
>that globbing/command iteration would not be done on them.
via commands at link time.  Unless you insised on using the old linker,
which always left them enabled.

The whole globbing (aka wildcarding) and command line iteration
question was handled nicly in Primos.  (It was a relitivly recent
(early eightys) addition.)

Wildcard expantion and command line iteration are done by executing
the command multiple times with different arguments.  This way old
commands did not need to be rewritten.  Also makes results of wildcard
renames easier to understand :-)

Besides the wildcard characters (+ matches any single character, @ 0
or more non-periods, @@ 0 or more characters) it also could create
another filename based on the wildcard expantion.

copy @.(c,h) ==.+orig

would copy foo.c to foo.c.orig, bar.c to bar.c.orig, and foo.h to
foo.h.orig .


-- 
blarson@usc.edu
		C news and rn for os9/68k!
-- 
Bob Larson (blars)	blarson@usc.edu			usc!blarson
	Hiding differences does not make them go away.
	Accepting differences makes them unimportant.

schwartz@groucho.cs.psu.edu (Scott Schwartz) (04/03/91)

In article <186@blars> blarson@blars writes:

| Scott Schwartz writes:
| >Yes, although I should clarify that Primos' command processor doesn't
| >do redirection itself.
| 
| Untrue, it can redirect input from a file (comi command) and redirect
| output to a file (in adition to or instead of the terminal) (como command).

You are correct about comi, but (at least as of rev20, when I last
used primos) como had the unfortunate property of printing both input
and output, with no way to distinguish them after the fact.  If it
would just print output everything would have been great.  (If there
was a way to do that in rev19 or 20, I wish you'd have let me know in
1985!  :-)

| And didn't like how your own code worked.  Not something to blame on Primos.

I'm not blaming primos, just commenting that iteration had an
particular side effect.  And, as others have said, it points out one
of the things to watch out for if you do implement iteration in a Unix
shell if you want it to interact properly with redirection.

coren@osf.org (Robert Coren) (04/03/91)

In article <?55G-t_e1@cs.psu.edu>, schwartz@groucho.cs.psu.edu (Scott Schwartz) writes:
|> Multics, apparently, had some way to
|> do redirection ("utterly trivial", cf DMR, 1984), so it might actually
|> have been possible in Primos.

Multics's "utterly trivial" redirection, at least for output, was done
outside of the command processor (= shell), which avoided some of the
complexities inherent in the UNIX approach. The "file_output" command
would redirect (the equivalent of) stdout to the specified file until
restored by the use of "revert_output". Instances of "file_output"
could be "stacked". Simple example:

	file_output listfile; list; revert_output

Of course, you could have as many commands, of arbitrary complexity,
with as much iteration, "globbing", etc., as you liked, between the
"file_output" and "revert_output" commands.

Late in its life (around 1987), Multics added UNIX-style redirection
and piping; I don't remember the details, since by that time I was
working on something else. (There were difficulties coming up with an
acceptable syntax, as I recall, since Multics used > as the directory
separator in pathnames. barmar, do you remember any of the details?)
	Robert

barmar@think.com (Barry Margolin) (04/04/91)

In article <20618@paperboy.OSF.ORG> coren@osf.org (Robert Coren) writes:
>In article <?55G-t_e1@cs.psu.edu>, schwartz@groucho.cs.psu.edu (Scott Schwartz) writes:
>|> Multics, apparently, had some way to
>|> do redirection ("utterly trivial", cf DMR, 1984), so it might actually
>|> have been possible in Primos.
>
>Multics's "utterly trivial" redirection, at least for output, was done
>outside of the command processor (= shell), which avoided some of the
>complexities inherent in the UNIX approach. The "file_output" command
>would redirect (the equivalent of) stdout to the specified file until
>restored by the use of "revert_output". Instances of "file_output"
>could be "stacked". Simple example:
>
>	file_output listfile; list; revert_output

There is also the syn_output command, which is analogous to Bourne shell's
">&" syntax for linking two file descriptors together.  Since Multics
doesn't have devices in the file system, this is useful along with the more
JCLish syntax for attaching arbitrary devices:

	io attach tape_stream tape_ansi_ <options>
	io open tape_stream output
	syn_output tape_stream; list; revert_output
	io (close detach) tape_stream

>Late in its life (around 1987), Multics added UNIX-style redirection

Damn right it was late in its life -- it was already canceled at the time!

>and piping; I don't remember the details, since by that time I was
>working on something else. (There were difficulties coming up with an
>acceptable syntax, as I recall, since Multics used > as the directory
>separator in pathnames. barmar, do you remember any of the details?)

A little.  I think the token we chose was ";|".  Both of these characters
are used by the command processor, but there is no useful time when they
would be adjacent.  This sequence is used for both piping and redirection,
based on context.  You can write

	<source> ;| <command1> ;| <command2> ;| <destination>

If <source> or <destination> is multiple tokens, they are either commands
to be executed or an I/O attach description; these can be distinguished by
looking for appropriate entrypoints in the executable file (commands have
entrypoints named the same as the command, I/O modules have entrypoints
named <modulename>attach).  If it's a single token, then it could also be a
data file pathname; I think it either looks at the access mode to see
whether the user has execute permission, or perhaps it looks at the
contents of the file to see whether it is in executable format, and treats
it as data if the test is false.

Yes, it's a kludge.  Also, since Multics doesn't really support
multitasking within a login session (there's a lightweight process library,
but programs have to be pretty well behaved to work with it well), pipes
don't work as well as they do on Unix (<command2> won't run until
<command1> has completed).

I think we also added something analogous to Unix `...`.  Multics already
had [<command>] for inline substitution, but this syntax requires the
command to recognize that it is being called as a function and return the
substitution string as its value.  I think we added the syntax [|<command>]
(or something similar), to substitute the standard output.

Finally, I think we added a syntax for collecting output into a temporary
file and then substituting the name of the file into the command line
(since most Multics software is designed to operate on named files, not
standard input).  I'm drawing a complete blank on the syntax we added (it's
possible that the syntax I described in the preceding paragraph was
actually used for this and I don't remember that syntax).

--
Barry Margolin, Thinking Machines Corp.

barmar@think.com
{uunet,harvard}!think!barmar