throopw@sheol.UUCP (Wayne Throop) (04/02/91)
> peter@ficc.ferranti.com (Peter da Silva) > As an excersize I am in the process of writing a shell for UNIX that only > performs globbing when requested. It is not going to be anything of the > complexity of the regular UNIX shells... sort of a baby shell for novices. > So far the total length of the shell is 472 lines, and it already parses > statements and executes them. No globbing is as yet implemented. What I find missing is what one might call the philosophical underpinnings upon which to evaluate the proposed features of the shell. Starting in by throwing things in it by intuition is OK if it's supposed to be for your own amusement or pure exploration, but if its intended to reflect some need or strategy or whatnot, this need or strategy (or whatnot) needs to be set out right from the start. Of course, one can explore around and change this initial statement based on things found while implementing, but unless it is put down at least fairly explicitly, the result isn't likely to be coherent. Also, if you don't state the rules, you won't know what you've learned when you are done with the excersize. I suggest these "planks" in your proposed platform. 1) Lexical parsimony. The shell should avoid "using up" special characters, so that quotes are rarely necessary. This rule implies things like: there should be only one quotation convention, and any variations shouln't use more special characters. Also, things like comment conventions, line continuation, argument keyword (flag) introduction and the like should be chosen to avoid conflict with common argument text contents (eg: both - and / are rotten choices for keyword introducers (but since the '-' is interpreteted in the commands instead of the shell, alternatives are at best complicated to introduce at this time... (but it's still a goal worth approximating))) Example question: should "variables" and "commands" use (and thus "consume" in the lex-space) completely different special characters to insert values on the command line? 2) Lexical universality. The shell should group arguments in the "cannonical way". That is, while quotation can specify any string as any argument, things like () and [] and "" should group the contained text. Note that the *interpretation* of the string containing the (), [], or whatnot is NOT done by the shell, nor are the grouping characters "recognized" or "processed" in any way. That would be up to some bit of code that knows the meaning of the argument. Things would group in the "obvious way", so for example, (a + b) or func( foo, bar, bletch ) or array[i + 1] would all be single arguments. 3) Lexical locality. The whole distinction between $foo and "$foo" should simply not occur in the way it now does. The blanks inside argument expansions and "backquoted" expansions should not be unconditionally interpreted. To put it another way, characters interpreted by the shell should be apparent at the top level. Now clearly, an "eval" operation is necessary so that one can construct a command out of bits and peices including syntactically significant characters and sic the shell "reader" on it, but this should only occur explicitly. 4) Syntactic simplicity. Syntax for grouping commands, binding values to names, and so on, should follow simple regular rules. No abominations like if..fi, case..esac, do..done (done? you say DONE? ghak! (which illustrates an implication of this rule: your syntax shouldn't overlap your command namespace if you can help it)). Also, introduction of a special syntax simply to bind values is also the Wrong Thing, especially if you also have to execute a command to make this binding "take" for the usual case. Or to put it another way, add functionality by adding semantics, not by adding syntax (where possible). 5) Semantic explicitness. (Note: this is only partly a shell concern.) Implicit functionality should always be utterable explicitly. For example, if specifying something as an empty string is done by not specifying anything at all, it ought to also work to specify "", or $empty-string or whatnot. For another example, it ought to be possible to state explicitly that argument-so-and-so is NOT given. (This makes it much easier to compose operations.) 6) Orthogonality. If you can "foo" something, you better be able to (at least) "un-foo" that something. If you can mumble-east something, you better think about adding mumble-(south|west|north) operations on that same something. And so on. (This is largely an issue for built-in commands, like setenv or rehash or the like.) There are other likely planks, but perhaps this'll start a discussion. -- Wayne Throop ...!mcnc!dg-rtp!sheol!throopw
peter@ficc.ferranti.com (Peter da Silva) (04/03/91)
In article <1563@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: > I suggest these "planks" in your proposed platform. > 1) Lexical parsimony. The shell should avoid "using up" special > characters, so that quotes are rarely necessary. Yes, I like this. > Example question: should "variables" and > "commands" use (and thus "consume" in the lex-space) completely > different special characters to insert values on the command line? No. I'd only use *one* metasequence [...]. The [...[...]...] would nest, but that's that. [set name] would expand to a variable (taken from TCL). > 2) Lexical universality. The shell should group arguments in the > "cannonical way". That is, while quotation can specify any string > as any argument, things like () and [] and "" should group the > contained text. Wouldn't that conflict with [glob *.c], which expands into multiple tokens? > 3) Lexical locality. The whole distinction between $foo and "$foo" > should simply not occur in the way it now does. I'm not sure this is a good idea. I would want [glob *.c] to mean something different than '[glob *.c]'. But, also, I want [glob *] to generate the equivalent of 'baz' 'da foo bar' 'fido' if those are the names of the three files. Perhaps [glob] will have to be written to properly quote stuff for the shell? This might take some thinking about... but it's not desirable otherwise. I was thinking of defining three explicit phases of translation: 1) Statement extraction: the first statement is parsed out of the input stream. Continuations and statement separators are handled here. The only sequence interpreted at this point is backslash-newline. This is properly handled, so that backslash-backslash-newline is passed through. 2) Function evaluation. Functions are evaluated and included literally in the text. Any quotes in the inclusion are doubled or escaped. The only sequences interpreted here are [...] and \[. 3) Argument list generation. The result is broken up into arguments. \x is replaced by x for all x, and not interpreted. This means that both \' and '' will quote a quote: this is not desirable... I wanted it to be strictly '', but it appears that would cause an unfortunate assymetry. The other alternative is to interpret only a particular set of escapes, maybe just \[, \\, and \<nl>. I think, though, I need \<space> as well to properly handle globbing... which quickly leads to excessive complexity. I don't want to abandon the phases of translation model, for a variety of reasons. First of which, of course, being that it makes it easy to describe what the language does. > 4) Syntactic simplicity. Syntax for grouping commands, binding values > to names, and so on, should follow simple regular rules. For sure! But I wasn't going to implement much of this stuff... if you think you need that, may I direct you to TCL which already follows most of your guidelines... if not all... and has the advantage that it already exists. If you want TCL, you have TCL. This sucker is not intended to be a complete command language with loops and stuff, OK? > 5) Semantic explicitness. (Note: this is only partly a shell concern.) Hmmm... > 6) Orthogonality. If you can "foo" something, you better be able to > (at least) "un-foo" that something. If you can mumble-east something, > you better think about adding mumble-(south|west|north) operations on > that same something. And so on. (This is largely an issue for built-in > commands, like setenv or rehash or the like.) Hmmm... -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
throopw@sheol.UUCP (Wayne Throop) (04/08/91)
> peter@ficc.ferranti.com (Peter da Silva) >> throopw@sheol.UUCP (Wayne Throop) >> 1) Lexical parsimony. > Yes, I like this. > [...] I'd only use *one* metasequence [...]. The [...[...]...] would nest, > but that's that. [set name] would expand to a variable (taken from TCL). And I like *this*. (Though, see some reservations below.) >> 2) Lexical universality. > Wouldn't that conflict with [glob *.c], which expands into multiple tokens? Yes... *if* each token must map onto an argument. More on this in the "lexical locality" issue, below. >> 3) Lexical locality. The whole distinction between $foo and "$foo" >> should simply not occur in the way it now does. > I'm not sure this is a good idea. I would want [glob *.c] to mean something > different than '[glob *.c]'. But now you run into the question of just how you would utter the string "[glob *.c]". The quotation convention shouldn't have such simplistic exceptions, because this is the road to multiple conflicting quotation conventions, some of which mean *really* quote, and others mean quote *this* but not *that*, and so on and on. Yuck. There are many ways to solve this problem. The way I was thinking of was to have an intermediate rule-based or hook-based layer for mapping tokens into argument lists, but perhaps that's a bit too complicated a mechanism for this context. There's also saying "foo [glob *]" vs "eval foo [glob *]" for this distinction, but that's perhaps a bit too verbose for this context. In which case (I think) the "right" way to proceed is to have some qualifier in the [] syntax to state whether the result should be rescanned for argument breaks or not. The default (in view of the normal use of globbing and argument expansion) would be to rescan, but (say) [=...] (as opposed to [...]) would be presented as a single token (that is, would "be born quoted"). In fact, special treatment of various kinds could be tied into special [] subsequences... one could even do line continuation and the like. But the wisdom of this latter overloading is getting questionable. Finally, in a user-command-oriented shell, using [] for invocation might be alright... but while (as I said above) I like it, I'm bound to point out that it conflicts with globbing syntax of [a-z], and with regular expression syntax. Perhaps {} or `[] might serve better? > I want [glob *] to generate > the equivalent of 'baz' 'da foo bar' 'fido'. I presume this means that the *output* of glob should be specified in this way for reparsing, not that argument boundaries found in any old []-invoked thing are to be frozen by some heuristic... that is, that glob should supply this smarts, not the shell. This seems very good. > I was thinking of defining three explicit phases of translation: This is related to what I'd like to avoid by keeping to the rule of lexical locality. Multiple passes of lexical examination become confusing, and each stage so often consumes some special characters or requires an escape or quote convention. It is, I think, possible to interweave function evaluation with statement and token boundary processing in such a way as to avoid distinct passes and the problems I see related to them. If there's interest along these lines, I can go into further detail. I realize that Peter leans the other way on this issue of translation phases, for other reasons, so unless there's interest, I'll not pursue it. >> 4) Syntactic simplicity. > For sure! But I wasn't going to implement much of this stuff... > [..if you want TCL, you know where to find it..] Ok. >> 5) Semantic explicitness. > Hmmm... >> 6) Orthogonality. > Hmmm... Hmmm... yourself :-) I hope those "hmmmm"s were indicative that what I had to say was interesting, and not that it gave you reason to question my sanity... -- Wayne Throop ...!mcnc!dg-rtp!sheol!throopw
peter@ficc.ferranti.com (Peter da Silva) (04/10/91)
In article <1639@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: > >> 3) Lexical locality. The whole distinction between $foo and "$foo" > >> should simply not occur in the way it now does. > > I'm not sure this is a good idea. I would want [glob *.c] to mean something > > different than '[glob *.c]'. > > But now you run into the question of just how you would utter the > string "[glob *.c]". \[glob *.c\] The thing is, it's not just "[glob *.c]" How about this: fgrep '[cat /etc/foo /etc/bar]' /tmp/database You need to be able to specify expansion into multiple args (with glob) and expansion into a single arg (as in the example above). As for creating a bunch of special characters that mean something after "[", that's getting back to the original problem I'm attacking here. > Finally, in a user-command-oriented shell, using [] for invocation > might be alright... but while (as I said above) I like it, I'm bound > to point out that it conflicts with globbing syntax of [a-z], and > with regular expression syntax. Perhaps {} or `[] might serve > better? But in regular expressions [] is always paired, and since [] nests and is passed on, you could easily do: [glob [a-z]*.c] Perhaps you're right, though. I don't like things like `[...], but how about {...}? > I presume this means that the *output* of glob should be specified > in this way for reparsing, not that argument boundaries found in > any old []-invoked thing are to be frozen by some heuristic... that > is, that glob should supply this smarts, not the shell. This seems > very good. Yes, but on second thought I think I'll make line-feed the terminator for splitting apart lists into arguments. Then glob just has to generate a line-feed separated list of args. This would break for files with linefeeds in the name, but I think that's a minor problem. > It is, I think, possible to interweave function evaluation with > statement and token boundary processing in such a way as to > avoid distinct passes and the problems I see related to them. This is possibly true, but I see the rules becoming quite complex then. Perhaps you could donate a parser for this (would take a string on input, and generate two outputs: a parsed command in argv form if there's enough to specify one, and a pointer to the rest of the string. If the command is incomplete it'd return a pointer to the end of the string and null. I've run into time constraints in the real world, so it's no problem for me to hold off until you have this. > I hope those "hmmmm"s were indicative that what I had to say was > interesting, and not that it gave you reason to question my sanity... Interesting, but not sure what to say. I mean it about that parser. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
throopw@sheol.UUCP (Wayne Throop) (04/15/91)
> peter@ficc.ferranti.com (Peter da Silva) >> But now you run into the question of just how you would utter the >> string "[glob *.c]". > \[glob *.c\] Note that the escape is "merely" an odd-looking quotation convention. So this usage complicates (one could say "pollutes") the simple quotation convention started out with. The escape notation could be rolled up into the eval notation if there were some way to have the results of eval substitution be "born quoted". This drastically expands the lexical simplicity of the whole thing. For example, [asc 012] for newline. Mnemonic instead of numeric versions of the same thing. And so on. Then, the backslash could be used in a more "natural" way inside regular expressions, and in other commands that give it a meaning, rather than having do double up on them, or quote them (with the Politically Correct quote), or whatnot. > You need to be able to specify expansion into multiple args (with glob) > and expansion into a single arg (as in the example above). Yes, I agree. But these notations need not be *introduced* independently, but might more economically (in terms of "how many characters are 'special' in a vanilla context") be variants of a single special case notation. > As for creating a bunch of special characters that mean something after > "[", that's getting back to the original problem I'm attacking here. Well then, let the extra notation be regular, eg [enquote [somecommand]] Thus, the "usual" case would be to rescan, but special cases can be had relatively cleanly, and even user-extensibly. (Though there are some problems with making the enquotation come later... perhaps that ought to be [enquote somecommand] to resolve these little glitches...) > But in regular expressions [] is always paired, and since [] nests and > is passed on, you could easily do: > [glob [a-z]*.c] Hmmmmm. I'd have thought the inner [] would be recursively expanded, so that cases like [wc -l [glob *.c]] could be used easily and naturally to get linecounts as arguments to something. Thus, the inner [a-z] above would conflict. When you had said the notation "nests" in the past I'd assumed you meant recursive expansion. If you didn't want recursive expansion, how would the wc case above be uttered? Something like somecommand [agsh -c 'wc -l [glob *.c]'] to make the recursion explicit? Don't seem natural somehow. > `[...], but how about {...}? Seems OK. Does {foo;bar;bletch} do the "obvious" thing? > Perhaps you could donate a parser for this I'm interested, but my time may be more limited than you'd like. I'm also still trying to get straight what you intend the exact semantics of things to be... you might not like the semantics of any contribution I could give :-). I'll think about it some more and let you know via email. -- Wayne Throop ...!mcnc!dg-rtp!sheol!throopw
new@ee.udel.edu (Darren New) (04/16/91)
>> peter@ficc.ferranti.com (Peter da Silva) >> You need to be able to specify expansion into multiple args (with glob) >> and expansion into a single arg (as in the example above). The real problem is that you are doing this under UNIX, where stdin and stdio are unstructured streams of bytes instead of something more sophisticated. If you had a system where the glob program could return an <argc, argv> tuple, you wouldn't have this problem. It would be an argument to glob rather than to the shell which determined whether it gets quoted. UNIX... sigh. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, FDTs ----- +=+=+ My time is very valuable, but unfortunately only to me +=+=+ +=+ Nails work better than screws, when both are driven with screwdrivers +=+
schwartz@groucho.cs.psu.edu (Scott Schwartz) (04/16/91)
In article <50837@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes:
The real problem is that you are doing this under UNIX, where stdin and
stdio are unstructured streams of bytes instead of something more
sophisticated.
In this case, where you are dealing with pathnames, you can use '\0'
characters as record separators.
peter@ficc.ferranti.com (Peter da Silva) (04/17/91)
In article <50837@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: > >> peter@ficc.ferranti.com (Peter da Silva) > >> You need to be able to specify expansion into multiple args (with glob) > >> and expansion into a single arg (as in the example above). > The real problem is that you are doing this under UNIX, where stdin and > stdio are unstructured streams of bytes instead of something more > sophisticated. This has nothing to do with UNIX. This is purely a matter of syntax: whether the syntax: '[...]' and the syntax: [...] should both expand into a single argument or whether the latter expands into multiple arguments. Actually passing an argv back from glob is not the problem: I can print a null-separated string if I want. You still have to ask the question: what do you do with it when you get it? -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
peter@ficc.ferranti.com (Peter da Silva) (04/17/91)
In article <1685@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes: > > peter@ficc.ferranti.com (Peter da Silva) > >> But now you run into the question of just how you would utter the > >> string "[glob *.c]". > > \[glob *.c\] Yes, that's the general idea. > Note that the escape is "merely" an odd-looking quotation convention. It's moe than a bit messy... > So this usage complicates (one could say "pollutes") the simple quotation > convention started out with. I would say "pollutes", myself. > Hmmmmm. I'd have thought the inner [] would be recursively expanded, > so that cases like [wc -l [glob *.c]] could be used easily and naturally > to get linecounts as arguments to something. Thus, the inner [a-z] > above would conflict. Yes, of course you're right. Never mind. > > `[...], but how about {...}? > Seems OK. Does {foo;bar;bletch} do the "obvious" thing? Run foo, bar, and bletch and incorporate the result in the command? Yes. More and more for this I like using newline seperators only for splitting these up. I still like having quotes only effect argument parsing, and using escapes to escape quotes. > > Perhaps you could donate a parser for this > I'm interested, but my time may be more limited than you'd like. Hey, *my* time is more limited than I like. > I'm also still trying to get straight what you intend the exact > semantics of things to be... you might not like the semantics of > any contribution I could give :-). Sure, but it'd give us something to work on. Maybe when I see it in action I'll like it more than my own ideas. -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
peter@ficc.ferranti.com (Peter da Silva) (04/17/91)
What do we call this thing? "dash"? -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
new@ee.udel.edu (Darren New) (04/17/91)
In article <UYRAN=4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >Actually passing an argv back from glob is not the problem: >I can print a null-separated string if I want. You still have to ask the >question: what do you do with it when you get it? So why not expand glob slightly as follows: [glob -1 *.c] -- returns all *.c files as one argument [glob *.c] -- returns all *.c files, broken up Then 'glob' handles either breaking up the file names or not. You could do things like [blah | tr '\012' ' '] to get similar effects to the -1 parameter on glob for other programs. -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, FDTs ----- +=+=+ My time is very valuable, but unfortunately only to me +=+=+ +=+ Nails work better than screws, when both are driven with screwdrivers +=+
peter@ficc.ferranti.com (Peter da Silva) (04/18/91)
In article <50982@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes: > [glob -1 *.c] -- returns all *.c files as one argument > [glob *.c] -- returns all *.c files, broken up OK, so what does this mean: '[glob -1 *.c]' ???? It seems to me to be going to a lot of work to avoid using the intuitive distinction between: [...] and '[...]' -- Peter da Silva. `-_-' peter@ferranti.com +1 713 274 5180. 'U` "Have you hugged your wolf today?"
new@ee.udel.edu (Darren New) (04/24/91)
In article <CUSA91C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes: >OK, so what does this mean: > '[glob -1 *.c]' I don't know. It's your shell. :-) I'm just trying to give some possibilities. Personally, I would make '[glob -1 *.c]' unevaluated. I haven't really been following the discussion too closely. -- Darren -- --- Darren New --- Grad Student --- CIS --- Univ. of Delaware --- ----- Network Protocols, Graphics, Programming Languages, FDTs ----- +=+=+ My time is very valuable, but unfortunately only to me +=+=+ +=+ Nails work better than screws, when both are driven with screwdrivers +=+