[comp.os.misc] shell with da Silva lining

throopw@sheol.UUCP (Wayne Throop) (04/02/91)

> peter@ficc.ferranti.com (Peter da Silva)
> As an excersize I am in the process of writing a shell for UNIX that only
> performs globbing when requested. It is not going to be anything of the
> complexity of the regular UNIX shells... sort of a baby shell for novices.
> So far the total length of the shell is 472 lines, and it already parses
> statements and executes them. No globbing is as yet implemented.

What I find missing is what one might call the philosophical
underpinnings upon which to evaluate the proposed features of the shell. 
Starting in by throwing things in it by intuition is OK if it's supposed
to be for your own amusement or pure exploration, but if its intended to
reflect some need or strategy or whatnot, this need or strategy (or
whatnot) needs to be set out right from the start.  Of course, one can
explore around and change this initial statement based on things found
while implementing, but unless it is put down at least fairly
explicitly, the result isn't likely to be coherent.   Also, if you don't
state the rules, you won't know what you've learned when you are
done with the excersize.

I suggest these "planks" in your proposed platform.

1) Lexical parsimony.  The shell should avoid "using up" special
characters, so that quotes are rarely necessary.  This rule implies
things like: there should be only one quotation convention, and any
variations shouln't use more special characters.  Also, things like
comment conventions, line continuation, argument keyword (flag)
introduction and the like should be chosen to avoid conflict with
common argument text contents (eg: both - and / are rotten choices
for keyword introducers (but since the '-' is interpreteted in
the commands instead of the shell, alternatives are at best
complicated to introduce at this time... (but it's still a goal
worth approximating)))  Example question: should "variables" and
"commands" use (and thus "consume" in the lex-space) completely
different special characters to insert values on the command line?

2) Lexical universality.  The shell should group arguments in the
"cannonical way".  That is, while quotation can specify any string
as any argument, things like () and [] and "" should group the
contained text.  Note that the *interpretation* of the string
containing the (), [], or whatnot is NOT done by the shell, nor are
the grouping characters "recognized" or "processed" in any way.
That would be up to some bit of code that knows the meaning of the
argument.  Things would group in the "obvious way", so for example, 
           (a + b)
or         func( foo, bar, bletch )
or         array[i + 1]

would all be single arguments.

3) Lexical locality.  The whole distinction between $foo and "$foo"
should simply not occur in the way it now does.  The blanks inside
argument expansions and "backquoted" expansions should not be
unconditionally interpreted.  To put it another way, characters
interpreted by the shell should be apparent at the top level.  Now
clearly, an "eval" operation is necessary so that one can construct a
command out of bits and peices including syntactically significant
characters and sic the shell "reader" on it, but this should only occur
explicitly. 

4) Syntactic simplicity.  Syntax for grouping commands, binding values
to names, and so on, should follow simple regular rules.  No
abominations like if..fi, case..esac, do..done (done? you say DONE?
ghak! (which illustrates an implication of this rule: your syntax
shouldn't overlap your command namespace if you can help it)).  Also,
introduction of a special syntax simply to bind values is also the Wrong
Thing, especially if you also have to execute a command to make this
binding "take" for the usual case.  Or to put it another way, add
functionality by adding semantics, not by adding syntax (where
possible). 

5) Semantic explicitness.  (Note: this is only partly a shell concern.)
Implicit functionality should always be utterable explicitly.  For
example, if specifying something as an empty string is done by not
specifying anything at all, it ought to also work to specify "", or
$empty-string or whatnot.  For another example, it ought to be possible
to state explicitly that argument-so-and-so is NOT given.  (This makes
it much easier to compose operations.)

6) Orthogonality.  If you can "foo" something, you better be able to
(at least) "un-foo" that something.  If you can mumble-east something,
you better think about adding mumble-(south|west|north) operations on
that same something.  And so on.  (This is largely an issue for built-in
commands, like setenv or rehash or the like.)

There are other likely planks, but perhaps this'll start a discussion.
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

peter@ficc.ferranti.com (Peter da Silva) (04/03/91)

In article <1563@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
> I suggest these "planks" in your proposed platform.

> 1) Lexical parsimony.  The shell should avoid "using up" special
> characters, so that quotes are rarely necessary.

Yes, I like this.

> Example question: should "variables" and
> "commands" use (and thus "consume" in the lex-space) completely
> different special characters to insert values on the command line?

No. I'd only use *one* metasequence [...]. The [...[...]...] would nest,
but that's that. [set name] would expand to a variable (taken from TCL).

> 2) Lexical universality.  The shell should group arguments in the
> "cannonical way".  That is, while quotation can specify any string
> as any argument, things like () and [] and "" should group the
> contained text.

Wouldn't that conflict with [glob *.c], which expands into multiple tokens?

> 3) Lexical locality.  The whole distinction between $foo and "$foo"
> should simply not occur in the way it now does.

I'm not sure this is a good idea. I would want [glob *.c] to mean something
different than '[glob *.c]'. But, also, I want [glob *] to generate
the equivalent of 'baz' 'da foo bar' 'fido' if those are the names of
the three files. Perhaps [glob] will have to be written to properly quote
stuff for the shell? This might take some thinking about... but it's not
desirable otherwise. I was thinking of defining three explicit phases of
translation:

	1) Statement extraction: the first statement is parsed
	   out of the input stream. Continuations and statement
	   separators are handled here. The only sequence interpreted
	   at this point is backslash-newline. This is properly
	   handled, so that backslash-backslash-newline is passed
	   through.

	2) Function evaluation. Functions are evaluated and included
	   literally in the text. Any quotes in the inclusion are
	   doubled or escaped. The only sequences interpreted here are [...]
	   and \[.

	3) Argument list generation. The result is broken up into
	   arguments. \x is replaced by x for all x, and not
	   interpreted. This means that both \' and '' will quote
	   a quote: this is not desirable... I wanted it to be
	   strictly '', but it appears that would cause an unfortunate
	   assymetry. The other alternative is to interpret only
	   a particular set of escapes, maybe just \[, \\, and \<nl>.
	   I think, though, I need \<space> as well to properly
	   handle globbing... which quickly leads to excessive
	   complexity.

I don't want to abandon the phases of translation model, for a variety
of reasons. First of which, of course, being that it makes it easy to
describe what the language does.

> 4) Syntactic simplicity.  Syntax for grouping commands, binding values
> to names, and so on, should follow simple regular rules.

For sure! But I wasn't going to implement much of this stuff... if you think
you need that, may I direct you to TCL which already follows most of your
guidelines... if not all... and has the advantage that it already exists.

If you want TCL, you have TCL. This sucker is not intended to be a complete
command language with loops and stuff, OK?

> 5) Semantic explicitness.  (Note: this is only partly a shell concern.)

Hmmm...

> 6) Orthogonality.  If you can "foo" something, you better be able to
> (at least) "un-foo" that something.  If you can mumble-east something,
> you better think about adding mumble-(south|west|north) operations on
> that same something.  And so on.  (This is largely an issue for built-in
> commands, like setenv or rehash or the like.)

Hmmm...
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

throopw@sheol.UUCP (Wayne Throop) (04/08/91)

> peter@ficc.ferranti.com (Peter da Silva)
>> throopw@sheol.UUCP (Wayne Throop)
>> 1) Lexical parsimony.
> Yes, I like this.
> [...] I'd only use *one* metasequence [...]. The [...[...]...] would nest,
> but that's that. [set name] would expand to a variable (taken from TCL).

And I like *this*.  (Though, see some reservations below.)

>> 2) Lexical universality.
> Wouldn't that conflict with [glob *.c], which expands into multiple tokens?

Yes... *if* each token must map onto an argument.  More on this in
the "lexical locality" issue, below.

>> 3) Lexical locality.  The whole distinction between $foo and "$foo"
>> should simply not occur in the way it now does.
> I'm not sure this is a good idea. I would want [glob *.c] to mean something
> different than '[glob *.c]'.

But now you run into the question of just how you would utter the
string "[glob *.c]".  The quotation convention shouldn't have such
simplistic exceptions, because this is the road to multiple conflicting
quotation conventions, some of which mean *really* quote, and others
mean quote *this* but not *that*, and so on and on.  Yuck.

There are many ways to solve this problem.  The way I was thinking of
was to have an intermediate rule-based or hook-based layer for mapping
tokens into argument lists, but perhaps that's a bit too complicated a
mechanism for this context.  There's also saying "foo [glob *]" vs "eval
foo [glob *]" for this distinction, but that's perhaps a bit too verbose
for this context. 

In which case (I think) the "right" way to proceed is to have some
qualifier in the [] syntax to state whether the result should be
rescanned for argument breaks or not.  The default (in view of the
normal use of globbing and argument expansion) would be to rescan,
but (say) [=...] (as opposed to [...]) would be presented as a single
token (that is, would "be born quoted").

In fact, special treatment of various kinds could be tied into
special [] subsequences... one could even do line continuation and
the like.  But the wisdom of this latter overloading is getting
questionable.

Finally, in a user-command-oriented shell, using [] for invocation
might be alright... but while (as I said above) I like it, I'm bound
to point out that it conflicts with globbing syntax of [a-z], and
with regular expression syntax.  Perhaps {} or `[] might serve
better?

> I want [glob *] to generate
> the equivalent of 'baz' 'da foo bar' 'fido'.

I presume this means that the *output* of glob should be specified
in this way for reparsing, not that argument boundaries found in
any old []-invoked thing are to be frozen by some heuristic... that
is, that glob should supply this smarts, not the shell.  This seems
very good.

> I was thinking of defining three explicit phases of translation:

This is related to what I'd like to avoid by keeping to the
rule of lexical locality.  Multiple passes of lexical examination
become confusing, and each stage so often consumes some special
characters or requires an escape or quote convention.

It is, I think, possible to interweave function evaluation with
statement and token boundary processing in such a way as to
avoid distinct passes and the problems I see related to them.

If there's interest along these lines, I can go into further detail.  I
realize that Peter leans the other way on this issue of translation
phases, for other reasons, so unless there's interest, I'll not pursue
it. 

>> 4) Syntactic simplicity.
> For sure! But I wasn't going to implement much of this stuff...
> [..if you want TCL, you know where to find it..]
Ok.

>> 5) Semantic explicitness.
> Hmmm...
>> 6) Orthogonality.
> Hmmm...

Hmmm... yourself :-)

I hope those "hmmmm"s were indicative that what I had to say was
interesting, and not that it gave you reason to question my sanity... 
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

peter@ficc.ferranti.com (Peter da Silva) (04/10/91)

In article <1639@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
> >> 3) Lexical locality.  The whole distinction between $foo and "$foo"
> >> should simply not occur in the way it now does.
> > I'm not sure this is a good idea. I would want [glob *.c] to mean something
> > different than '[glob *.c]'.
> 
> But now you run into the question of just how you would utter the
> string "[glob *.c]".

	\[glob *.c\]

The thing is, it's not just "[glob *.c]" How about this:

	fgrep '[cat /etc/foo /etc/bar]' /tmp/database

You need to be able to specify expansion into multiple args (with glob)
and expansion into a single arg (as in the example above).

As for creating a bunch of special characters that mean something after
"[", that's getting back to the original problem I'm attacking here.

> Finally, in a user-command-oriented shell, using [] for invocation
> might be alright... but while (as I said above) I like it, I'm bound
> to point out that it conflicts with globbing syntax of [a-z], and
> with regular expression syntax.  Perhaps {} or `[] might serve
> better?

But in regular expressions [] is always paired, and since [] nests and
is passed on, you could easily do:

	[glob [a-z]*.c]

Perhaps you're right, though. I don't like things like `[...], but how
about {...}?

> I presume this means that the *output* of glob should be specified
> in this way for reparsing, not that argument boundaries found in
> any old []-invoked thing are to be frozen by some heuristic... that
> is, that glob should supply this smarts, not the shell.  This seems
> very good.

Yes, but on second thought I think I'll make line-feed the terminator
for splitting apart lists into arguments. Then glob just has to generate
a line-feed separated list of args. This would break for files with
linefeeds in the name, but I think that's a minor problem.

> It is, I think, possible to interweave function evaluation with
> statement and token boundary processing in such a way as to
> avoid distinct passes and the problems I see related to them.

This is possibly true, but I see the rules becoming quite complex then.
Perhaps you could donate a parser for this (would take a string on input,
and generate two outputs: a parsed command in argv form if there's enough
to specify one, and a pointer to the rest of the string. If the command
is incomplete it'd return a pointer to the end of the string and null.

I've run into time constraints in the real world, so it's no problem
for me to hold off until you have this.

> I hope those "hmmmm"s were indicative that what I had to say was
> interesting, and not that it gave you reason to question my sanity... 

Interesting, but not sure what to say.

I mean it about that parser.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

throopw@sheol.UUCP (Wayne Throop) (04/15/91)

> peter@ficc.ferranti.com (Peter da Silva)
>> But now you run into the question of just how you would utter the
>> string "[glob *.c]".
>         \[glob *.c\]

Note that the escape is "merely" an odd-looking quotation convention. 
So this usage complicates (one could say "pollutes") the simple quotation
convention started out with.  The escape notation could be rolled up
into the eval notation if there were some way to have the results of
eval substitution be "born quoted".  This drastically expands the
lexical simplicity of the whole thing.

For example, [asc 012] for newline.  Mnemonic instead of numeric
versions of the same thing.  And so on.  Then, the backslash could be
used in a more "natural" way inside regular expressions, and in other
commands that give it a meaning, rather than having do double up on
them, or quote them (with the Politically Correct quote), or whatnot. 

> You need to be able to specify expansion into multiple args (with glob)
> and expansion into a single arg (as in the example above).

Yes, I agree.  But these notations need not be *introduced*
independently, but might more economically (in terms of "how many
characters are 'special' in a vanilla context") be variants of a single
special case notation. 

> As for creating a bunch of special characters that mean something after
> "[", that's getting back to the original problem I'm attacking here.

Well then, let the extra notation be regular, eg

           [enquote [somecommand]]

Thus, the "usual" case would be to rescan, but special cases can be had
relatively cleanly, and even user-extensibly.  (Though there are some
problems with making the enquotation come later...  perhaps that ought
to be [enquote somecommand] to resolve these little glitches...)

> But in regular expressions [] is always paired, and since [] nests and
> is passed on, you could easily do:
>         [glob [a-z]*.c]

Hmmmmm.  I'd have thought the inner [] would be recursively expanded,
so that cases like [wc -l [glob *.c]] could be used easily and naturally
to get linecounts as arguments to something.  Thus, the inner [a-z]
above would conflict.

When you had said the notation "nests" in the past I'd assumed you meant
recursive expansion.  If you didn't want recursive expansion, how would
the  wc case above be uttered?  Something like

        somecommand [agsh -c 'wc -l [glob *.c]']

to make the recursion explicit?  Don't seem natural somehow.

> `[...], but how about {...}?

Seems OK.  Does {foo;bar;bletch} do the "obvious" thing?

> Perhaps you could donate a parser for this

I'm interested, but my time may be more limited than you'd like.
I'm also still trying to get straight what you intend the exact
semantics of things to be... you might not like the semantics of
any contribution I could give :-).

I'll think about it some more and let you know via email.
--
Wayne Throop  ...!mcnc!dg-rtp!sheol!throopw

new@ee.udel.edu (Darren New) (04/16/91)

>> peter@ficc.ferranti.com (Peter da Silva)
>> You need to be able to specify expansion into multiple args (with glob)
>> and expansion into a single arg (as in the example above).

The real problem is that you are doing this under UNIX, where stdin and
stdio are unstructured streams of bytes instead of something more 
sophisticated.  If you had a system where the glob program could 
return an <argc, argv> tuple, you wouldn't have this problem. It would
be an argument to glob rather than to the shell which determined
whether it gets quoted.   UNIX...  sigh.          -- Darren

-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, FDTs -----
     +=+=+ My time is very valuable, but unfortunately only to me +=+=+
+=+ Nails work better than screws, when both are driven with screwdrivers +=+

schwartz@groucho.cs.psu.edu (Scott Schwartz) (04/16/91)

In article <50837@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes:
   The real problem is that you are doing this under UNIX, where stdin and
   stdio are unstructured streams of bytes instead of something more 
   sophisticated. 

In this case, where you are dealing with pathnames, you can use '\0'
characters as record separators.

peter@ficc.ferranti.com (Peter da Silva) (04/17/91)

In article <50837@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes:
> >> peter@ficc.ferranti.com (Peter da Silva)
> >> You need to be able to specify expansion into multiple args (with glob)
> >> and expansion into a single arg (as in the example above).

> The real problem is that you are doing this under UNIX, where stdin and
> stdio are unstructured streams of bytes instead of something more 
> sophisticated.

This has nothing to do with UNIX. This is purely a matter of syntax: whether
the syntax:

	'[...]'

and the syntax:

	[...]

should both expand into a single argument or whether the latter expands into
multiple arguments. Actually passing an argv back from glob is not the problem:
I can print a null-separated string if I want. You still have to ask the
question: what do you do with it when you get it?
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (04/17/91)

In article <1685@sheol.UUCP> throopw@sheol.UUCP (Wayne Throop) writes:
> > peter@ficc.ferranti.com (Peter da Silva)
> >> But now you run into the question of just how you would utter the
> >> string "[glob *.c]".
> >         \[glob *.c\]

Yes, that's the general idea.

> Note that the escape is "merely" an odd-looking quotation convention. 

It's moe than a bit messy...

> So this usage complicates (one could say "pollutes") the simple quotation
> convention started out with.

I would say "pollutes", myself.

> Hmmmmm.  I'd have thought the inner [] would be recursively expanded,
> so that cases like [wc -l [glob *.c]] could be used easily and naturally
> to get linecounts as arguments to something.  Thus, the inner [a-z]
> above would conflict.

Yes, of course you're right. Never mind.

> > `[...], but how about {...}?

> Seems OK.  Does {foo;bar;bletch} do the "obvious" thing?

Run foo, bar, and bletch and incorporate the result in the command? Yes.

More and more for this I like using newline seperators only for splitting
these up. I still like having quotes only effect argument parsing, and
using escapes to escape quotes.

> > Perhaps you could donate a parser for this

> I'm interested, but my time may be more limited than you'd like.

Hey, *my* time is more limited than I like.

> I'm also still trying to get straight what you intend the exact
> semantics of things to be... you might not like the semantics of
> any contribution I could give :-).

Sure, but it'd give us something to work on. Maybe when I see it in action
I'll like it more than my own ideas.
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

peter@ficc.ferranti.com (Peter da Silva) (04/17/91)

What do we call this thing? "dash"?
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

new@ee.udel.edu (Darren New) (04/17/91)

In article <UYRAN=4@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>Actually passing an argv back from glob is not the problem:
>I can print a null-separated string if I want. You still have to ask the
>question: what do you do with it when you get it?

So why not expand glob slightly as follows:

   [glob -1 *.c]   -- returns all *.c files as one argument
   [glob *.c]      -- returns all *.c files, broken up

Then 'glob' handles either breaking up the file names or not.
You could do things like [blah | tr '\012' ' '] to get similar
effects to the -1 parameter on glob for other programs.

-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, FDTs -----
     +=+=+ My time is very valuable, but unfortunately only to me +=+=+
+=+ Nails work better than screws, when both are driven with screwdrivers +=+

peter@ficc.ferranti.com (Peter da Silva) (04/18/91)

In article <50982@nigel.ee.udel.edu> new@ee.udel.edu (Darren New) writes:
>    [glob -1 *.c]   -- returns all *.c files as one argument
>    [glob *.c]      -- returns all *.c files, broken up

OK, so what does this mean:

	'[glob -1 *.c]'

????

It seems to me to be going to a lot of work to avoid using the intuitive
distinction between:

	[...]
and
	'[...]'
-- 
Peter da Silva.  `-_-'  peter@ferranti.com
+1 713 274 5180.  'U`  "Have you hugged your wolf today?"

new@ee.udel.edu (Darren New) (04/24/91)

In article <CUSA91C@xds13.ferranti.com> peter@ficc.ferranti.com (Peter da Silva) writes:
>OK, so what does this mean:
>	'[glob -1 *.c]'

I don't know.  It's your shell. :-)
I'm just trying to give some possibilities.  Personally, I would make
'[glob -1 *.c]' unevaluated.  I haven't really been following the
discussion too closely.               -- Darren
-- 
--- Darren New --- Grad Student --- CIS --- Univ. of Delaware ---
----- Network Protocols, Graphics, Programming Languages, FDTs -----
     +=+=+ My time is very valuable, but unfortunately only to me +=+=+
+=+ Nails work better than screws, when both are driven with screwdrivers +=+