barmar@think.com (Barry Margolin) (01/15/91)
[I've followed someone else's lead and directed followups to comp.os.misc.] In article <BSON.91Jan14170904@rice-chex.ai.mit.edu> bson@ai.mit.edu (Jan Brittenson) writes: >In article <1991Jan14.170115.17178@Think.COM> > barmar@think.com (Barry Margolin) writes: > > I've used several systems (Multics, ITS, TOPS-20) where the commands > > invoked the globbing routine, ... Filename arguments that don't make > > sense to be wildcards (e.g. the last argument to mv) are scanned for > > wildcard characters, and generate an error if any are seen > > The Unix approach have its advantages sometimes. Assume you have >the files "L19901200.axx772" and "L19910100.bxx19974". If you wish >overwrite the second file with the first it makes sense to use > > mv L*72 L*74 > >if that's unique enough. Another case is cd. When you have only file, >and it's a directory, "cd *" makes sense. If I have a list of files >and like to see if L199901200.axx772 has been backed up, it makes >sense to use the command I admit that I make use of wildcards as convenient way to abbreviate filenames, and when I used Multics I sometimes even wished that its "cd" equivalent allowed this. However, I don't think this is a good excuse for such a poor user interface design. The "cd" command can warn if the wildcard expands into multiple filenames, since it only allows one argument; however, in the "mv" case, such a mistake can result in completely unexpected behavior with no warning. In my opinion, an interactive mechanism is a much better basis for an abbreviation design. For instance, you could type "L*74" and then hit a control or function key that would cause the string to be replaced by the matching filenames (or maybe it would beep if the wildcard matches multiple files). > grep -l L*772 L*backup > >to get a list of what backup file lists it appears in. This >orthogonality is quite a strength of Unix. Granted, there's an almost >infinite potential for error, with no recovery. The fact that you can find a use for a weird behavior doesn't mean that behavior is a good feature. Which is more useful: grep -l L*772 L*backup or grep #define.*EXT *.c *.h > > Putting the work in the utility allows useful syntaxes such as: > > > > mv * =.old > > This is something I quite miss. But I'd rather see it solved in the >shell than in the tools, perhaps as variation of the {...} syntax. Say >that {:re} were made a regex quote, then one could write something >like: > > mv {:\(^.+\)$ \1.old} This violates your point that just changing the shell would be enough: mv(1) doesn't accept multiple old/new filename pairs. The argument syntaxes of most Unix commands are designed based on the kinds of expansions that Unix shells are expected to provide; since the shells are known not to provide the above expansion, no commands are prepared to accept such arguments. > I guess every Unix programmer has at some point wished there was a >sh-compatible glob(3) for some very specific purposes, though. Yes! Many programs allow filenames to be specified interactively; if they want to allow these filenames to be wildcards and be sure of sh-compatibility, they have to fork a subshell, pipe the output of echo, and parse the result (and this won't work if any of the matching files have spaces in their names). Also, one might want to perform wildcard matching on names of other things besides files. > > If there's a reasonable library routine available, the hardest part > > should be deciding which arguments should be processed as wildcards. > > It really doesn't make much difference whether it's in a library or >in a shell, really, as long as there are sufficient hooks for >redefining the syntax. I mean, instead of escaping don't-matches, you >end up escaping do-matches to tell the globber you do want a specific >argument globbed regardless of what the tool thinks is appropriate. True. For instance, in your above grep command, you could do: grep -l `ls L*772` L*backup > In Unix hooks really aren't necessary, as you can switch shells >easily, or implement a different syntax in your application and be >sure nothing is globbed following exec(2). You can't really switch shells all that easily. On many Unix systems, you can only make a shell your default shell if it is in your system's /etc/shells. > > I agree, it isn't cat's job [to glob]. However, it would be the job > > of a wildcard_match() library routine, which would take a wildcard > > argument and return an array of filenames. It would be cat's job to > > call this for its filename arguments. > > I'm not sure how feasible this is. In order to handle `command` >constructs you would need to include an entire shell, more or less. Or >you could start a subshell, in which case we're back where this >discussion started. (Should the subshell call wildcard_match()?) I don't consider `command` the same as globbing. It's part of the shell syntax, as are alias and variable expansion. The difference between these and wildcard expansion is that the latter is much more context-dependent, since it is specific to filename arguments. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
barmar@think.com (Barry Margolin) (01/15/91)
[Note: followups directed to comp.os.misc.] In article <5371@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes: >In article <11390@lanl.gov> jlg@lanl.gov (Jim Giles) writes: [In response to one of my posts about commands controlling wildcard expansion, rather than the shell blindly expanding all wildcards.] >>I usually don't post to these wild flame-fest discussions (even those >>I start!) except on weekends. But the above is the first intelligent >>comment about wildcard 'globbing' that has been posted. I just wanted >>to say "Bravo!" >Yo, Jim. Ever heard of the ' and " characters? They _do_ come in handy when you >don't wan't to glob part or all of a command line. Why should the user have to type extra characters to tell the computer not to do something, when it should know better all by itself. The computer is supposed to *save* the user from having to worry about trivial details like this. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
kenw@skyler.calarc.ARC.AB.CA (Ken Wallewein) (01/15/91)
It is quite practical to have the shell do the parsing, BTW. All one needs is a way to tell the shell about the syntax of the command. Once the mechanism is in place, one gets a major improvement in consitancy of command syntax, and a powerful tool for managing it. But it's not the Unix Way, and therefor heathen. -- /kenw Ken Wallewein A L B E R T A kenw@noah.arc.ab.ca R E S E A R C H (403)297-2660 C O U N C I L
throop@aurs01.UUCP (Wayne Throop) (01/16/91)
> barmar@think.com (Barry Margolin) >> pfalstad@phoenix.Princeton.EDU (Paul John Falstad) >>> jlg@lanl.gov (Jim Giles) >>> [...] the shells that are bundled with versions of UNIX [...] >>> trash [...] arguments [.. ie, expand wildcards ..] >>> [...] This is a choice that _should_ be left to the discretion of the >>> utility writer. >> This forces the user to remember, for each individual command, whether >> or not it processes wildcards. > [..But..] There's a > simple rule: if an argument is a filename, and it makes sense in the syntax > of the command to refer to multiple files at once, then it is processed for > wildcards. Actually, a generalization of this notion is quite attractive. Suppose, by analogy with C prototypes and coersions, that the commands the shell knows about have arguments of specific type. This allows provision for coersion of things of type string (which the shell naturally deals in) to things of type (say) filename, or filename list, or whatnot (which would naturally involve wildcard expansion). Or, to put it another way, NEITHER the shell NOR the command should be responsible for wildcard expansion... it ought to be the responsibility of coersion code associated with a type. (Given performance constraints of encapsulating this code in a process, it is likely that the only way to make this work well is with dynamic linking of some sort. Also note that "types" should be coin-able on the fly, just as commands should be.) Note that such a scheme of "prototypes" and "coersions" supplied by the notion of shell types can easily solve problems (like a wildcard intended as the last argument to mv expanding multiply, and thus going awry, or like expansion-on-the-fly in a GUI or other WIMP-ish interface) which placing coersions in the shell proper OR in the command proper cannot address well. Further, the notion of "flags" can be incorporated as keyword arguments in such a scheme, leading to a much cleaner and simpler framework with as much or more expressive power as the current "getopt" notions. ( I am prejudiced in this, of course, since I worked on/with a command processor that implemented all these notions and more. ) Wayne Throop ...!mcnc!aurgate!throop
root@lingua.cltr.uq.OZ.AU (Hulk Hogan) (01/16/91)
barmar@think.com (Barry Margolin) writes: >[Note: followups directed to comp.os.misc.] >In article <5371@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes: >>In article <11390@lanl.gov> jlg@lanl.gov (Jim Giles) writes: >[In response to one of my posts about commands controlling wildcard >expansion, rather than the shell blindly expanding all wildcards.] >>>I usually don't post to these wild flame-fest discussions (even those >>>I start!) except on weekends. But the above is the first intelligent >>>comment about wildcard 'globbing' that has been posted. I just wanted >>>to say "Bravo!" >>Yo, Jim. Ever heard of the ' and " characters? They _do_ come in handy when you >>don't wan't to glob part or all of a command line. >Why should the user have to type extra characters to tell the computer not >to do something, when it should know better all by itself. The computer is >supposed to *save* the user from having to worry about trivial details like >this. I disagree. Although it *sounds* right when someone says that a program should glob it's arguments, so that it only globs filename args, it won't work in practice. Whilst I'm all for a glob(3) call to aid programmers with filenames received interactively, making globbing arguments the responsibility of the program means that wildcarding won't be consistant across all programs, which I feel is much worse than being bitten very occasionally by unexpected wildcarding. As has already been pointed out, you'd have different amounts of globbing in different programs, some none, some just "?", some "*", some "[chars]", some "{string,string}" .... It would be a nightmare. You'd have to remember which programs had which globbing... Yuk. The only possibility would be a glob(3) routine used by all commands for consistent globbing, which all vendors would have to keep functionally identical (no value adding!) for the sake of the poor users. Every program ever written programs would have to be modified to use it, and if a new glob(3) was added on a machine which added the ability to negate single characters (with "[^chars]" as tcsh does), then all programs would have to be recompiled to access the new glob(3)... Maybe you could pop it into a shared library (on machines which support them), but it all seems like infinite effort for infinitesimal gain. I prefer to know that "I'm using sh, so I can do ? and * and [chars]" and "I'm under tcsh, I can do ?,*,[chars], [^chars], {string,string...} to ALL programs". When a new tcsh comes out with some new wildcard, I can use it on all programs then too. It's consistent. It works. Besides, aren't you overstating the importance of this? In my experience, the only times I've ever had to worry about this is with the "-name" option to find (find -name pattern .....), wildcards as arguments to the grep(1) family and not being able to use "-?" as a help option. This is at worst, a minor annoyance, and may be relieved by the ability to tell the shell which commands you don't want globbed, as is done in "zsh" by Paul Falstad (pfalstad@phoenix.Princeton.EDU). Maybe you have filenames with wildcard characters in them, or programs which require you to be able to send in "?", "*" or "[" characters on the command line? I don't. /\ndy -- Andrew M. Jones, Systems Programmer, Internet: andy@lingua.cltr.uq.oz.au Centre for Lang. Teaching & Research, Phone (Australia): (07) 365 6915 University of Queensland, St. Lucia, Phone (World): +61 7 365 6915 Brisbane, Qld. AUSTRALIA 4072 Fax: +61 7 365 7077 "No matter what hits the fan, it's never distributed evenly....."
det@hawkmoon.MN.ORG (Derek E. Terveer) (01/16/91)
barmar@think.com (Barry Margolin) writes: >In my opinion, an interactive mechanism is a much better basis for an >abbreviation design. For instance, you could type "L*74" and then hit a >control or function key that would cause the string to be replaced by the >matching filenames (or maybe it would beep if the wildcard matches multiple >files). You can do this in ksh. Typing a "*" or "=" on top of the "L*74" will provide either expansion or a list of what the exansion would look like. -- Derek "Tigger" Terveer det@hawkmoon.MN.ORG - MNFHA, NCS - UMN Women's Lax, MWD I am the way and the truth and the light, I know all the answers; don't need your advice. -- "I am the way and the truth and the light" -- The Legendary Pink Dots
barmar@think.com (Barry Margolin) (01/16/91)
In article <1991Jan15.222146.9697@lingua.cltr.uq.OZ.AU> root@lingua.cltr.uq.OZ.AU (Hulk Hogan) writes: >Besides, aren't you overstating the importance of this? In my experience, >the only times I've ever had to worry about this is with the "-name" option >to find (find -name pattern .....), wildcards as arguments to the grep(1) >family and not being able to use "-?" as a help option. This is at worst, >a minor annoyance, and may be relieved by the ability to tell the shell >which commands you don't want globbed, as is done in "zsh" by Paul Falstad >(pfalstad@phoenix.Princeton.EDU). Please don't construe my remarks as implying that this is a major flaw in Unix, or that I am suggesting that any attempt be made to fix this aspect of Unix. The current scheme works well enough, and it would be impossible to change it because it would affect nearly all Unix programs. No, I am merely pointing out that I think this part of Unix was a poor design choice, and that a glob(3) library would have been better (I wouldn't even think of suggesting that commands do their own globbing without aid from a library that does the hard part). The most annoying thing about it is that it has such pervasive effects that it can't possibly be redesigned. Compare this with the original design that didn't provide a library interface to directories; when the directory library came out years later, programs could slowly migrate to it, but programs based on the old design didn't immediately break (but eventually they did, when things like NFS came out). Had Unix been originally designed with all commands calling glob(3), and then someone decided that it would be better to move globbing into the shells, such a change would have been possible. One possibility would be to replace glob(3) with a version that simply returns its argument as the only match, on the assumption that all shells would glob. Another possibility would be to have a quoting convention that glob(3) recognizes, and have the globbing shells quote the resulting pathnames so that glob(3) wouldn't reglob them. This second design would allow an easy migration to globbing shells, as users of non-globbing shells wouldn't be scrod immediately (but as new programs that expect the shell to glob come out they would be). Unix is riddled with designs that would be impossible to change because so many programs are dependent on them. Things like pathname syntax (lots of programs know about the "/" delimiter), the special filenames "." and "..", the pathnames of terminal devices, etc. It would at least have been nice if all these special strings were defined in standard header files. On Multics, very few programs need to know anything about pathname syntax, and the user-visible pathname syntax isn't even the same as the kernel's pathname syntax (they're similar, but the kernel doesn't accept relative pathnames, and requires the directory name to be separated from the entry name for some reason); there's a library routine that takes a pathname in user syntax and parses it into the kernel's syntax (many users, myself included, used dynamic linking to access customized versions of the pathname parser, to extend the syntax in various ways). By the way, to be fair, we didn't get all this right in Multics, either. Not all useful pathname operations were embodied in library routines, so some programs did have to manipulate them directly. And there were some magic numbers that found their way into many programs. But Unix, being the progeny of Multics, should have done *better*, not worse. I can understand the corners that were cut to allow Unix to run on smaller boxes than Multics, but some of these are straightforward software engineering principles that have nothing to do with the runtime horsepower. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
barmar@think.com (Barry Margolin) (01/17/91)
In article <1991Jan16.063253.2834681@locus.com> dana@locus.com (Dana H. Myers) writes: > Ok. Once you said "All tools should work together in a well >thought out way". Then you say that applications should be forced to >somehow tell the shell whether to glob or not. What this means is that >some applications would glob while others wouldn't. How would one know >what a given app does for sure? Hmmm.. I'm getting tired of repeatedly seeing this argument. How do you know that *any* program follows common conventions? For instance, how would one know that a given app reads stdin and writes stdout, so that it can be used as a filter? You know because someone once proclaimed this convention, and everyone simply follows it (except when they don't, such as in the "passwd" command, or curses applications, etc.). > Also, how would you modify the way the shell works in order to >allow applications to control whether globbing is done on args? Please >keep in mind the way the shell works; it fork()s (that is, Jim, makes >another process which is a running copy of the shell) and then exec()s >the application (that is, replaces the running copy of the shell with >the application). At this point, the arguments have been processed >and the shell no longer has any control. There are lots of ways to do this. The simplest way is for the shell not to glob, but for the application to call a library routine that does it. If you want the shell to glob before it invokes the application, it could maintain a database specifying the syntax of each command. Or it could open the executable and read the argument syntax specification from the header before globbing. None of these changes are feasible in current Unix, but they are ideas for future OSes. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (01/17/91)
In article <1991Jan15.222146.9697@lingua.cltr.uq.OZ.AU> root@lingua.cltr.uq.OZ.AU (Hulk Hogan) writes:
I disagree. Although it *sounds* right when someone says that a program
should glob it's arguments, so that it only globs filename args, it won't
work in practice.
Gee - I feel that way about having the shell do globbing. It makes
writing shell scripts that want to glob correctly - especially when
they pass different arguments to different internal commands, some of
which should be globbed, and some of which shouldn't - nearly
impossible. But it can be done.
Of course, both arguments fall down when one examines the real world -
Unix proves that having the shell glob works in practice. TOPS20
(among a large number of others) prove that having commands use a
system-supplied glob routine work in practice.
The only possibility would be a glob(3) routine used by all commands for
consistent globbing, which all vendors would have to keep functionally
identical (no value adding!) for the sake of the poor users.
Sorry, wrong answer. The routine is in a shared library. Each vendor
can add whatever value they want. Every command on the system gets
that added value. Users don't have to worry about it being
inconsistent unless they go to a different version of the OS - but you
have that problem for _all_ value added features.
Maybe you have filenames with wildcard characters in them, or programs
which require you to be able to send in "?", "*" or "[" characters on
the command line? I don't.
You don't use grep, sed or awk? I find myself typing regular
expressions (which look like file globbing expressions, only they're
different) on the command line regularly. Trying to use those
expressions as arguments in a shell script is always an interesting
experience.
<mike
--
Tell me how d'you get to be Mike Meyer
As beautiful as that? mwm@pa.dec.com
How did you get your mind decwrl!mwm
To tilt like your hat?
dana@locus.com (Dana H. Myers) (01/17/91)
In article <1991Jan16.182106.1758@Think.COM> barmar@think.com (Barry Margolin) writes: >In article <1991Jan16.063253.2834681@locus.com> dana@locus.com (Dana H. Myers) writes: >> Ok. Once you said "All tools should work together in a well >>thought out way". Then you say that applications should be forced to >>somehow tell the shell whether to glob or not. What this means is that >>some applications would glob while others wouldn't. How would one know >>what a given app does for sure? Hmmm.. > >I'm getting tired of repeatedly seeing this argument. How do you know that >*any* program follows common conventions? For instance, how would one know >that a given app reads stdin and writes stdout, so that it can be used as a >filter? You know because someone once proclaimed this convention, and >everyone simply follows it (except when they don't, such as in the "passwd" >command, or curses applications, etc.). Well, if the shell always globs (or always doesn't glob), and the program is not left to decide, then you always know how applications behave with respect to globbing. In this case, the common convention is enforced by the shell, which was the root of this entire discussion. The shell does not enforce any conventions on how a program deals with I/O. This, as you point out, leads to ambiguity. I don't, however, see how this speaks against my original argument (which you are tired of seeing). My point is that enforcing conventions in one spot does reduce the total 'ambiguity level' of the system, at the possible expense of convenience or features. >> Also, how would you modify the way the shell works in order to >>allow applications to control whether globbing is done on args? Please >>keep in mind the way the shell works; it fork()s (that is, Jim, makes >>another process which is a running copy of the shell) and then exec()s >>the application (that is, replaces the running copy of the shell with >>the application). At this point, the arguments have been processed >>and the shell no longer has any control. > >There are lots of ways to do this. The simplest way is for the shell not >to glob, but for the application to call a library routine that does it. >If you want the shell to glob before it invokes the application, it could >maintain a database specifying the syntax of each command. Or it could >open the executable and read the argument syntax specification from the >header before globbing. > >None of these changes are feasible in current Unix, but they are ideas for >future OSes. One change which is feasible in current Unix is to use a permission bit to specify glob/noglob (along with set-uid and the sticky bit). I would argue this is the simplest; no change is required to the application (so you get backward compatibility) and a minor change is required in the shells. I don't seriously think this should be done, but it certainly is feasible in the current context. -- * Dana H. Myers KK6JQ | Views expressed here are * * (213) 337-5136 | mine and do not necessarily * * dana@locus.com | reflect those of my employer *
barmar@think.com (Barry Margolin) (01/17/91)
In article <1991Jan16.222155.2836960@locus.com> dana@locus.com (Dana H. Myers) writes: >In article <1991Jan16.182106.1758@Think.COM> barmar@think.com (Barry Margolin) writes: >>I'm getting tired of repeatedly seeing this argument. How do you know that >>*any* program follows common conventions? For instance, how would one know >>that a given app reads stdin and writes stdout > The shell does not enforce any conventions on how a program deals >with I/O. This, as you point out, leads to ambiguity. I don't, however, >see how this speaks against my original argument My point was that there is a claim that users would be confused and would have to memorize which commands behave in which way. Does the current situation with regard to I/O confuse users? Have you memorized which commands can be filters and which can't? I suspect the answer is "no" to both questions. I can only think of two commands that don't behave in the obvious way: passwd(1) reads from /dev/tty rather than stdin, for security reasons (although now that many systems have ptys, this reasoning is bogus, and it should probably be fixed); and stty(1) operates on stdin in System V, but stdout in BSD (System V's behavior seems more "obvious", as it prints its output to stdout, while BSD prints the non-error output to stdout, but it would probably make things much easier if they also accepted filename arguments on the command line). -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
barmar@think.com (Barry Margolin) (01/19/91)
In article <59451@aurs01.UUCP> throop@aurs01.UUCP (Wayne Throop) writes: >> barmar@think.com (Barry Margolin) >> None of these changes are feasible in current Unix, >> but they are ideas for future OSes. >I mildly disagree. >If, for example, one developed a shell where argument types and >relevant coersions could be specified INDEPENDENTLY of commands, >fitting "old" commands into a more intelligent shell would not >be prohibitive. It is only when the command has to do its own >coersions that things become unworkable. > >Unless of course Barry sees some difficulty I don't? Well, I'm not a big fan of command processors where the syntax descriptions are separate from the commands themselves, so I often inadvertently forget about them when making statements like the above one. The problem with a separate syntax DB is that it may be hard to update or find the DB entry. If there's a single, central database then it's hard for users to put their personal commands into it. If the database is keyed off command names then links with different names have trouble. If there is a per-directory DB then links across directories don't work. Probably the best place to put the syntax description is in the executable file's header, in some portion that is normally ignored by the exec*() system calls but can be examined by the shell. And for shell scripts it could be put in specially-marked comments right there in the script. Portability is also a problem. Commands that expect the shell to parse arguments in the new way might not be easy to run on old-style systems. When getopt() was first getting popular, the solution to the problem that not all systems have getopt() was that software packages often contained their own copy of the PD getopt() implementation, for use on systems that don't already have it, or the installation instructions contained information on getting the PD getopt(). During the transition period for these new shells would they have to be distributed along with the software that depends on them? There are also quite a few different shells, and users would need a new-style version of every one. This is why I prefer library-based approaches rather than shell-based approaches. -- Barry Margolin, Thinking Machines Corp. barmar@think.com {uunet,harvard}!think!barmar
throop@aurs01.UUCP (Wayne Throop) (01/22/91)
>,>>> barmar@think.com (Barry Margolin) >> throop@aurs01.UUCP (Wayne Throop) >>> None of these changes are feasible in current Unix, >>I mildly disagree. > Well, I'm not a big fan of command processors where the syntax descriptions > are separate from the commands themselves, so I often inadvertently forget > about them when making statements like the above one. > [.. well-presented details of some of the design tradeoffs > involved, omitted for brevity ..] > This is why I prefer library-based approaches rather than shell-based > approaches. Good points, but I'd like to briefly add one tradeoff which weighs in the opposite direction, and is perhaps the primary reason I end up prefering.... well, not so much "shell based" but "non-library based" approaches. If the mechanism is a library (or a bit more precisely, if the implementor interface to the mechanism is a library), it (IMHO) inordinately encourages an imperative or procedural description of the argument interface to a command. But if one has to spell it out in some command-description-ese, I find it encourages readability and reusability both. And (going along with the readability) a standard command-description-ese can make an excelent presentation in the user documentation, while being very precise. I sadly admit that these criteria are somewhat more vague than the ones Barry outlined. Sigh. Wayne Throop ...!mcnc!aurgate!throop