[comp.unix.shell] Bourne Shell bug? Have a look..

venta@otello.sublink.org (Paolo Ventafridda) (01/16/91)

Please have a look at this small program written in standard
Bourne shell:

:
set "one two three 4"
if [ "`echo $@ | grep '4'" != "" ]; then 
	echo "Four"
fi


Now, notice that a      `    is missing in the "if" above!
It works without any problems all the same: on Xenix, SCO Unix,
3B2, HP-UX and i guess on any bourne shell whose sources are
coming from at&t.
It seems that only BASH gets the error.

Ciao, Paolo
-- 
Paolo Ventafridda     -*-     INTERNET: venta@otello.sublink.org
TELEMATIX MILANO - Via C.Gomes 10, 20124 Milano -  +39-2-6706012

chet@odin.INS.CWRU.Edu (Chet Ramey) (01/16/91)

Paolo Ventafridda writes:

$ Please have a look at this small program written in standard
$ Bourne shell:
$ 
$ :
$ set "one two three 4"
$ if [ "`echo $@ | grep '4'" != "" ]; then 
$ 	echo "Four"
$ fi
$ 
$ 
$ Now, notice that a      `    is missing in the "if" above!
$ It works without any problems all the same: on Xenix, SCO Unix,
$ 3B2, HP-UX and i guess on any bourne shell whose sources are
$ coming from at&t.
$ It seems that only BASH gets the error.

The `standard' AT&T Bourne shell will silently add a missing
closing delimiter when it hits EOF.  I don't think ksh does,
except maybe for sh compatibility; this was listed by Korn in
his book as one of the differences between ksh and sh.  Bash
doesn't either. 

Chet

-- 
Chet Ramey				``There's just no surf in
Network Services Group			  Cleveland, U.S.A. ...''
Case Western Reserve University
chet@ins.CWRU.Edu		My opinions are just those, and mine alone.

morgan@ms.uky.edu (Wes Morgan) (01/16/91)

chet@po.CWRU.Edu writes:
>Paolo Ventafridda writes:
>
>$ :
>$ set "one two three 4"
>$ if [ "`echo $@ | grep '4'" != "" ]; then 
>$ 	echo "Four"
>$ fi
>$ 
>
>The `standard' AT&T Bourne shell will silently add a missing
>closing delimiter when it hits EOF.  I don't think ksh does,
>except maybe for sh compatibility; this was listed by Korn in
>his book as one of the differences between ksh and sh.  Bash
>doesn't either. 

Hmmmmm.....if it added the delimiter when it hit EOF, wouldn't
it put it at the end of the script, giving an "unexpected end
of file" error?   I had thought that this might be a precedence
problem.  If sh(1) assigned a higher precedence to " than to `
or ', wouldn't it insert it at the appropriate place?  I tested
this, and the script above works with missing ' as well as with
missing " or `..........

The FM on this system doesn't give a precedence listing for
metacharacters in sh(1); does such a beast exist?

Curious,
Wes


-- 
    | Wes Morgan, not speaking for | {any major site}!ukma!ukecc!morgan | 
    | the University of Kentucky's |        morgan@engr.uky.edu         |
    | Engineering Computing Center |   morgan%engr.uky.edu@UKCC.BITNET  | 
     Lint is the compiler's only means of dampening the programmer's ego.

allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (01/23/91)

As quoted from <1991Jan16.153557.15548@ms.uky.edu> by morgan@ms.uky.edu (Wes Morgan):
+---------------
| chet@po.CWRU.Edu writes:
| >Paolo Ventafridda writes:
| >
| >$ :
| >$ set "one two three 4"
| >$ if [ "`echo $@ | grep '4'" != "" ]; then 
| >$ 	echo "Four"
| >$ fi
| >$ 
| >
| >The `standard' AT&T Bourne shell will silently add a missing
| >closing delimiter when it hits EOF.  I don't think ksh does,
| 
| Hmmmmm.....if it added the delimiter when it hit EOF, wouldn't
| it put it at the end of the script, giving an "unexpected end
| of file" error?   I had thought that this might be a precedence
+---------------

Precedence doesn't play a part in it; the " is seen first, so the "word
splitter" scans to the next un-escaped ".  Thus, it gets the string:

	`echo $@ | grep '4'

When it later scans that string, it discovers the unbalanced ` and inserts one
at the end.

++Brandon
-- 
Me: Brandon S. Allbery			    VHF/UHF: KB8JRR on 220, 2m, 440
Internet: allbery@NCoast.ORG		    Packet: KB8JRR @ WA8BXN
America OnLine: KB8JRR			    AMPR: KB8JRR.AmPR.ORG [44.70.4.88]
uunet!usenet.ins.cwru.edu!ncoast!allbery    Delphi: ALLBERY

martin@mwtech.UUCP (Martin Weitzel) (01/27/91)

In article <1991Jan16.153557.15548@ms.uky.edu> morgan@ms.uky.edu (Wes Morgan) writes:
[as followup to an article describing some strange behaviour
 with certain sequences of quoting in shell commands]

>...   I had thought that this might be a precedence
>problem.  If sh(1) assigned a higher precedence to " than to `
>or ', wouldn't it insert it at the appropriate place?  I tested
>this, and the script above works with missing ' as well as with
>missing " or `..........
>
>The FM on this system doesn't give a precedence listing for
>metacharacters in sh(1); does such a beast exist?

No, because it's not exactly a matter of precedence, but more some `state
machine'. Think of beeing in the normal state in the beginning and of
several different `quote states'. State transfers are more or less as
follows ("more or less" because I don't want to go into the details of
\-quoting and variable expansion in the form of ${name} here).

	NORMAL  \ ->	QUOTE   any ->	NORMAL
	NORMAL	' ->	SQUOTE	' ->	NORMAL
	NORMAL  ` ->	IQUOTE	` ->	NORMAL
			IQUOTE	\ ->	Q_IQOUTE any ->	IQUOTE
	NORMAL  " ->	DQOUTE	" ->	NORMAL
			DQUOTE	\ ->	Q_DQOUTE any ->	DQUOTE
			DQOUTE	` ->	I_DQUOTE ` ->	DQUOTE
					I_DQUOTE " ->	NORMAL

The last one is one of the observed "strangenesses" - S.R. Bourne could
also have choosen to consider this an syntax error, but he didn't and
simply transfered back to the normal state.

Some may argue that the above list isn't complete and that there
are also transfers for " in state IQUOTE (say, to D_IQUOTE) but
that isn't necessary: The whole contents of an IQOUTE-section of
the command line is handed more or less unchanged% to a separate
shell process an re-interpreted in this new context. All other quotes
are honored there and are no business of the shell that parses a line
which contains `..."..."...` .

Finally note that the shell conserves the DQUOTE-edness (oh, what a cute
word have I just coined here :-)) of any part of the command line until
command substitution is done%%. Effectively this means that the chars
which are written to standard out from any command within `....` 
may go into a quoted context, (if the `...` appeared *within* "......")
or not (if the `...` appeared *not* within ".......").

% : The parent shell *has* some business with \-quotes within `....`, which
    it must take care of before it hands the line to the other shell.
%%: The same is true for variable subsitituion, i.e. all the $-constructs.

People: The Bourne-Shell IS understandable. In fact I found it to be more
regular than a number of other languages! But you mustn't look at it with
fixed ideas of what "must" be going on (e.g. which precedence have different
quotes?) but try to answer the question "What may be the rules behind the
behaviour I observe?". The number of rules you'll find is amazingly small -
though some people have a strange tendency to clutter things up with a lot
of exceptions, just to make things fit to their way of thinking. (To Wes,
who posted the question: It's *not* you I have in mind here.)

Oh, I've a nice small brain-teaser (sp?) for those who made it up to
here. You know (or maybe you are just about to learn here :-)) that
I/O-redirection can be given *anywhere* in the command line, i.e.
that the following three command lines are equivalent:

	echo hello >file
	echo >file hello
	>file echo hello

You further know (or again you just learn it), that there can be more
than one redirection per line:

	echo hello >file foo >baz

Now: Which redirection is the one that is finally in effect when the
programm (the `echo' in the above example) runs, if you combine both
of the above "strange" ways to do I/O-direction? (What I want is not
the answer to a specific example, but the general rule!) I'll post the
answer in a week or so if noone solves this).
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83

mike@bria (01/28/91)

In article <1067@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:
>You further know (or again you just learn it), that there can be more
>than one redirection per line:
>
>	echo hello >file foo >baz
>
>Now: Which redirection is the one that is finally in effect when the
>programm (the `echo' in the above example) runs, if you combine both
>of the above "strange" ways to do I/O-direction? (What I want is not
>the answer to a specific example, but the general rule!) I'll post the
>answer in a week or so if noone solves this).

I must admit that I was suprised a bit.  I would have guessed that 'file'
would be empty, and 'baz' contain "hello foo", when it's really the
other way around.  I thought the logic would have been:

	open 'file' for O_WRONLY|O_TRUNC|O_CREAT
	dup stdout
	dup2 file descriptor to stdout
	since stdout is a file ...
		dup2 duplicated stdout descriptor to stdout
	open 'baz' for O_WRONLY|O_TRUNC|O_CREAT
	dup stdout
	dup2 file descriptor to stdout (which would close file)

Does the shell just close the subsequent files it opens when it discovers
that stdout is not a tty?

IMHO, the shell should issue an error if redirection is specified for
the same descriptor more than once, but I suppose that there are those
who adore this particular feature ...
-- 
Michael Stefanik, Systems Engineer (JOAT), Briareus Corporation
UUCP: ...!uunet!bria!mike
--
technoignorami (tek'no-ig'no-ram`i) a group of individuals that are constantly
found to be saying things like "Well, it works on my DOS machine ..."

chet@odin.INS.CWRU.Edu (Chet Ramey) (02/01/91)

In article <1067@mwtech.UUCP> martin@mwtech.UUCP (Martin Weitzel) writes:

>You further know (or again you just learn it), that there can be more
>than one redirection per line:
>
>	echo hello >file foo >baz
>
>Now: Which redirection is the one that is finally in effect when the
>programm (the `echo' in the above example) runs, if you combine both
>of the above "strange" ways to do I/O-direction? (What I want is not
>the answer to a specific example, but the general rule!) I'll post the
>answer in a week or so if noone solves this).

That command will leave `baz' empty and "hello foo" in `file', but this is
wrong. 

I claim that the general rule that Martin is asking for is in fact a bug
in sh, and a long-standing one at that.  Consider that

	echo hello foo > file > baz

leaves `file' empty and "hello foo" in `baz'.  According to the sh
`grammar', and the claim that redirections may appear anywhere in a
command, this command and Martin's should produce identical results. 

I'll bet that the presence of the word `foo' between the redirections in
Martin's example somehow causes sh not to reverse the chain of redirections
that it builds, so that `file' is active when the command is run. 

Here is an example of what I consider correct behavior.  When bash
processes the command line that Martin gave, to use an example I'm familiar
with, it separates the redirections from the command words right away, in
the yacc grammar actions.  Bash incrementally builds a chain of
redirections by processing left to right.  (This is what Posix specifies,
though the language in the 1003.2 draft is `beginning to end'.)  Bash builds
the chain in a rather inefficient manner, appending each redirection to
the chain already built.  If bash simply appended the chain to the
redirection, the chain would need to be reversed to have redirections
processed in the proper order, and bash would have to do this before
processing any of them when it is time to exec the command.

This is the piece of the yacc grammar that accomplishes that (lots and lots
of essential support code and other rules omitted):

redirections:   redirection
                        {
                          $$ = $1;
                        }
        |       redirections redirection
                        {
                          register REDIRECT *t = $1;

                          while (t->next)
                            t = t->next;
                          t->next = $2;
                          $$ = $1;
                        }
        ;


Sh obviously wants to do something like this, as evidenced by the supposed
equivalent command I wrote above, but does not when presented with the
command Martin used.  I say it's a bug.  Bash, ash, ksh-86, and ksh-88
agree with me.  All AT&T versions of sh up to and including V.3.2 do not. 

Another thing: sh allows redirections anywhere in the command only for a 
simple command.  For example:

slc2$ sh
slc2$  > foo for i in 1 2 3
for: not found
slc2$ >foo echo hello
slc2$ cat foo
hello
slc2$

Chet

-- 
Chet Ramey				``There's just no surf in
Network Services Group			  Cleveland, U.S.A. ...''
Case Western Reserve University
chet@ins.CWRU.Edu		My opinions are just those, and mine alone.

martin@mwtech.UUCP (Martin Weitzel) (02/04/91)

In article <1067@mwtech.UUCP> martin@mwtech.UUCP I wrote:
>
>You further know (or again you just learn it), that there can be more
>than one redirection per line:
>
>	echo hello >file foo >baz
>
>Now: Which redirection is the one that is finally in effect when the
>programm (the `echo' in the above example) runs, if you combine both
>of the above "strange" ways to do I/O-direction? (What I want is not
>the answer to a specific example, but the general rule!) I'll post the
>answer in a week or so if noone solves this).

Thanks to "uunet!bria!mike (Michael Stefanik)" and "chet@odin.INS.CWRU.Edu
(Chet Ramey)" who both cared to write a detailed followup to my question.
What I asked for was the "general rule" implemented in the Bourne-Shell
and what I found was (at least Shell versions I know):

	1) Scan the command line for the last "true" arg (i.e. the
	   last arg which is not a redirection).
	2) Make one trip round the command line processing the
	   I/O redirection, starting from the last arg (see above)
	   and jumping to the beginning of the line if the end is
	   reached.
	3) For all the ">" encountered open this file (creating it
	   or truncating its size to zero). The last redirection is
	   the one that stays in effect.

Examples:
		echo hello >file foo >baz
		                 ^^^------+
		+----<-----<-----<--------+
		+--------------->

		echo hello foo >baz >file
		           ^^^------------+
		+----<-----<-----<--------+
		+--------->

		>file echo hello foo >baz
		                 ^^^ -----+
		+----<-----<-----<--------+
		+--------------->

In any of the three cases the result is an empty file "baz" and the
string "hello foo" in "file".

Note that I neither claim that what the shell does here is useful or
sensible, that it is a bug or a feature, or that some Shell script
should exploit or otherwise depend on this behaviour. The point I
wanted to make was simply that the rule demonstrated above is a very
simple one, though on first glance it seemed that several exceptions
from a general rule were necessary to describe the behaviour.
-- 
Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83