martin@mwtech.UUCP (Martin Weitzel) (01/08/91)
In article <443@minya.UUCP> jc@minya.UUCP (John Chambers) writes: >> What ALLWAYS works in the Bourne-Shell is this: >> >> for last do :; done > >Wow! A one-liner that works for more than 9 args! Of course, there's >the question as to whether this loop is actually faster than starting >a subprocess that just does puts(argv[artc-1]), but at least there's >a way to do it that is portable. I have compared the alternatives here on my 386 box and as you might guess the differences in speed depends on the length of the argument list. For ~25 arguments the for-loop is the fastest, above that up to ~100 arguments there's few difference, but the for loop uses more usr-time and the sub-process more sys-time. There seem to be minor differences between what is called as sub-process, i.e. a specialized C program (as the poster suggested) or another shell-script (as Maarten Litmaath posted earlier in this thread). For the rather untypical size of 250 arguments there still isn't much difference but sometimes the sub-process is faster (the results vary over some range and I didn't go into the efforts to calculate the average). My general experience with the 386 is that it starts sub-processes really fast, so I think the for-do method will even win even for more than 250 arguments on a lot of systems. (BTW: I've learned by my experiments that the shell internally limits the number of arguments that can be passed to a sub process to 254. I allways thought the only limit were the space supplied by the OS to pass the stuff to the sub-process, which is typically several KByte for the *contents* of arguments + environment. I never noticed the limit on the *number* of arguments before.) >That comment isn't worth wasting the bandwidth, of course; my motive >for this followup is a bit of bizarreness that I discovered while >testing this command. The usual format of a for loop is 3 lines: > for last > do : > done >Usually when I want to collapse such vertical code into a horizontal >format, I follow the rule "Replace the newlines with semicolons", and >it works. For instance, > if [ <test> ] > then <stuff> > else <stuff> > fi >reduces to > if [ <test> ];then <stuff>;else <stuff>;fi >which I can do in vi via a series of "Jr;" commands. With the above >for-loop, this gives > for last;do :;done >which doesn't work. The shell gives a syntax error, complaining about >an unexpected ';' in the line. Myself, I found this to be a somewhat >unexpected error message. It appears my simple-minded algorithm for >condensing code doesn't work in this case. > >So what's going on here? What the @#$^&#( is the shell's syntax that >makes the semicolon not only unneeded, but illegal in this case? Funny, I stumbled over the same thing when I "invented" my for-do method for accessing the last argument some years ago. The explanation is a bit longer, so all who aren't interested in the details should leave at this point. The syntax for the "for" statment is more or less the following (I stick to the "yacc"-style here, but include keywords into single quotes even if they are longer than one character, what is not allowed with "yacc"): for_stmt : 'for' NAME 'in' word_list SEP 'do' cmd_list 'done' | 'for' NAME 'do' cmd_list 'done' ; word_list: WORD | word_list WORD cmd_list : cmd arg_list SEP | cmd_list cmd arg_list SEP ; arg_list : /*empty*/ | arg_list WORD ; SEP : ';' | '\n' ; (The meaning of NAME and WORD should be obvious - I don't want to go into the syntactic details too far. I have further left out an undocumented shell feature, that allows you to replace "do" and "done" with "{" and "}"; note that the latter is only true for for-do-done, not for while-do-done and until-do-done!) Note that white space is allowed everywhere in between the tokens and nonterminals. But SEP is a mandatory seperator (which can be a newline or a semicolon). The reason for requiring a separator in some cases is simple: There is the possibility that some keywords of the shell might also be used as regular argument to commands or within a word_list - we'll come back to this in a moment. The shell detects the two forms of the "for" statement simply by looking at what follows the loop-variable. If it is an "in" then there must also follow a word_list, which in turn must be terminated by a mandatory seperator, as explained above. If there follows a "do" there is no wordlist. If there follows a semicolon after the loop-variable, this is against the syntax (this was what the poster puzzled). Of course, Mr. Bourne could have made the syntax to allow for it by changing the RHS of the rule for the "for" statement without "in" into 'for' NAME SEP 'do' cmd_list 'done' but IMHO the difficulties of the poster (and many more, me included) have some other reason, that has something to do with the difference between - mandatory command separators resp. terminators and - optional white space before commands and keywords and - spaces as separators of command and argument list and - the semicolon beeing allowed only in the first case and - the newline beeing allowed in the first and second case, - space characters beeing allowed in the second and third. In a simple command, i.e. a programm name that is followed by some arguments, there's not much of a problem as it seems "natural" for most users to type spaces to separate the arguments and newlines to terminate commands and it seems obvious that the two can not be used interchangable, as this either would terminate the argument list prematurely (if you try to separate arguments with a newline) or it doesn't properly end your command (if you don't type newline). Now let's consider the more complex shell statements. Some very stupid users might in fact expect that the shell can read their mind, but all the others will understand that the shell must either treat ALL keywords (and maybe even all the commands) special, not allowing them as regular arguments, or needs some other separator as the one used between arguments, if there shall follow a keyword after a command (or there shall be two commands) in the same line. The logic can be applied to most keywords regardless if they introduce some complex command or if they mark the beginning of the next part of the command (like "then" or "else" in an "if" statement). More puzzling is that the shell also ALLOWS newlines in place of spaces where it's clear that a complex command isn't complete%. One place where this occurs is when you start a "for" statement and have not yet supplied the matching "done". For example for var in foo bar <some newlines here (1)> do <some newlines here (2)> cmd <some newlines here (3)> done is all allowed, though seldom used, except for exactly one newline in the place marked (2). Note that the newlines before and after "cmd" here can not simply be seen as "empty commands", because if they could, the following would be legal: for var in foo bar do done which IS NOT, since there is at least ONE command necessary between "do" and "done" (please refer to the syntax given above). Note further that a semicolon by itself is NOT an empty command, as for var in foo bar do ; done does not work - you need at least the colon here: for var in foo bar do : done ------ %: More puzzling is that the shell does only allow it in some places. E.g. "for <newline>" is a syntax error while "for i <newline>" patiently waites for the "in" or "do". ------ >One of the real hassles I keep finding with /bin/sh (and /bin/csh is >even worse ;-) is that the actual syntax regarding things like white >space, newlines, and semicolons seems to be a secret. It often takes >a lot of experimenting to find a way to get these syntax characters >right. Is there any actual documentation on sh's syntax? Is it truly >as ad-hoc as the above example implies? For all I know the C-shell is more or less "ad-hoc", but for the Bourne shell (which, until now and for the rest of this article, I allways mean when I speak of "the shell") you can find a formal syntax allready in a very ancient document, the "Bell Systems Technical Journal" (BSTJ in short) from July/August 1978, ISSN0005-8580. The grammar starts on page 1987 as Appendix A of an article written by S.R. Bourne himself. Though it fails to mention some of the finer points (like the space/newline problems just discussed) it may serve as a start for you and I found that it could even be fed to yacc without much problems (I never tried to fill in the actions to make it work as a "real" shell ...) >Is there perhaps some logical >structure underlying it all that would explain why > for last do :; done >and > for last > do : > done >both work but > for last;do :;done >doesn't? Well, "logic" is not so much an absolut value as many of us think, as it often depends on what you expect. This is so because we may think we have recognized something as a "rule" and tend to see all withstanding observations as "illogical", where just the examples we studied were too limited to recognize that we had only a seen special case (in this generality that may also be true for the things we consider to be the "universal laws" or "laws of nature" - but this brings us away from the topic.) Now, what you observed were that newline and semicolon are interchangable in all the examples you looked at and have tried before you came to that "for" statement. (Remember I told you in the beginning that I had the same problem with this - so it can not be said that your expectations were without reason.) A bit more experimentation could also have shown that in general the both are not really interchangable. E.g. if you type a single newline nothing happens (except the shell prompts again), if you type two newlines still nothing happens but if you type a semicolon + a newline this is a syntax error. Hence semicolon and newline are not so much interchangable as it seemed on first glance. Now, having a little more experience we can come up with some other explanation: - commands can not be empty (they consist at least of an external or builtin command; the ":" is the builtin command which does nothing but evaluate its arguments) - a semicolon or a newline% terminates a command - a command list is a non-empty sequence of commands, all of which must be properly terminated - a semicolon or a newline terminates the word list of the "in" part of the "for" statement - space characters and newlines are allowed before commands - nearly all the keywords of the shell are only recognized if they are found in the position of a command, i.e. if there is a previous command or a word list of a "for" statement there MUST be a separator and their CAN be some space characters or newlines - the most important exceptions from the above are "in" (as well for the "for" statement as for the "case" statement) and "do". But as the word list in the "in" part of a "for" statement (or the command list after the "while" or "until" in such a statement) must be properly terminated, a "do" NOT in command position can only occur in a "in"-less "for" statement. ----- %: There are other valid command separators/terminators that are recognized together with the semicolon, but this doesn't matter here. ----- In some sense, this are the "laws of nature" as derived from observing the shell's behaviour. As the shell is not really nature but the outcome of the thoughts of some human beeing, we could of course complain now that this is "illogical" (compared to our sense logic!) or that there are "too many exceptions" and that it could be simplified with fewer, but more general rules. But when thinking how to smoothen things out by using fewer rules, we often do not recognize all the consequences that this would have. Assume for a momemt we would treat both, newline and semicolon, as statement terminator. Have you really considered what this would mean? Typing a newline (at your terminal or as empty line in a shell script) would be a syntax error (sic!) as a single semicolon is. Quite simple I hear you say, then we allow for an empty statement to be really empty, which would allow for single newlines as well as single semicolons. But be careful! We then must think about the exit status of such a statement. Should it allways be true as the colon command? But then you must be very careful inserting empty lines into a script, because the following two would have different semantics if | if cmd cmd | then | then and you must never separate command execution and accessing $? by a newline, since the empty command "newline" destroys the value of any previous command's exit status. Again I hear you say, we make the empty statment special - it shall leave the status of the "real" command that was executed last. But now the following will become dangerous while do <do something until exit or break> done as it depends on the last command BEFORE the loop when the loop is entered the first time, and after that on the last command executed WITHIN the loop. So, step by step we may introduce more special casing for something that looked like a trivial change in the first place! I hope you have gained a little more understanding for the syntax of the shell now. It isn't really as strange as it might seem on first glance, though I admit a few things are not so obvious and it's easy to come to some wrong conclusions if you have insufficient experience. (If this article hadn't become that long I could write a little more on it - maybe some other time.) -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83
eric@mks.com (Eric Gisin) (01/08/91)
The shell's interpretation of newline is context sensitive. It is usually equivalent to ";", but in a few cases it is equivalent to white-space (space or tab). The latter cases include after "|", "&&", "||", "for NAME", and "case WORD". So all the following are valid: $ ls | > wc $ true && > false || > maybe $ for x > in a b c > do : > done $ case x > in x) echo x! # ;; optional here > esac
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (01/11/91)
As quoted from <1033@mwtech.UUCP> by martin@mwtech.UUCP (Martin Weitzel): +--------------- | In some sense, this are the "laws of nature" as derived from observing | the shell's behaviour. As the shell is not really nature but the outcome | of the thoughts of some human beeing, we could of course complain now | that this is "illogical" (compared to our sense logic!) or that there | are "too many exceptions" and that it could be simplified with fewer, | but more general rules. | | But when thinking how to smoothen things out by using fewer rules, we | often do not recognize all the consequences that this would have. +--------------- There is one other problem. I daresay it would be possible to make Bourne shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK! At least, not without making the shell much less useful --- yacc (or other parser generators) grammars are not designed for interaction. In order to do interaction *well*, the shell needs to be able to have at least some idea of what is going on *without* having read an entire complex command (read "if/while/for/case/etc."). I've tried writing a yacc grammar that does this kind of thing in a graceful manner; I ended up using context-sensitive hacks, which I dislike in otherwise simple parsers. This is also why csh is not actually like C --- C can depend on the parser collecting statements for it, but csh is primarily designed for interactive use and therefore must be able to keep track of what's going on incrementally. ++Brandon -- Me: Brandon S. Allbery VHF/UHF: KB8JRR on 220, 2m, 440 Internet: allbery@NCoast.ORG Packet: KB8JRR @ WA8BXN America OnLine: KB8JRR AMPR: KB8JRR.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery Delphi: ALLBERY
ronald@robobar.co.uk (Ronald S H Khoo) (01/12/91)
allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes: > There is one other problem. I daresay it would be possible to make Bourne > shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK! > At least, not without making the shell much less useful Well, the some of the chaps at research seem to be quite happy with "rc" and that's got a yacc grammar... Apparently it was too painful to port /bin/sh to Plan 9 so Duff wrote "rc". (He presented a paper on it to the UKUUG Summer Conference last year) rc has exactly what you describe -- a regularised /bin/sh syntax. And of course, since they use Gnots running Pike's windowing stuff, no command line history/editing or anything like that in rc, it's just a shell, and looks quite nice too. Pity it's not available. -- ronald@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)
allbery@NCoast.ORG (Brandon S. Allbery KB8JRR) (01/13/91)
As quoted from <1991Jan12.012225.6727@robobar.co.uk> by ronald@robobar.co.uk (Ronald S H Khoo): +--------------- | allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes: | > There is one other problem. I daresay it would be possible to make Bourne | > shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK! | > At least, not without making the shell much less useful | | Well, the some of the chaps at research seem to be quite happy with "rc" | and that's got a yacc grammar... Apparently it was too painful to | port /bin/sh to Plan 9 so Duff wrote "rc". (He presented a paper on it | to the UKUUG Summer Conference last year) +--------------- I wondered if anyone would comment about that after I read the "rc" stuff. However, "rc" follows the general Plan 9 form (which, many ages, ago, was the general Unix form) of moving stuff into separate programs. "rc" is, in many ways, nowhere near as complex as even the V7 shell, much less the System V shell; it can get away with simple means of handling interactiveness in complex control structures. I was able to handle interactive use simply in a certain yacc grammar up to a certain point, then I had to start using context flags all over the place to make interactive use behave in an intuitive way. I don't recall what point it was, except that the program I was working on was gradually turning into a shell, which is why I eventually scrapped it in favor of using the existing shell. ++Brandon -- Me: Brandon S. Allbery VHF/UHF: KB8JRR on 220, 2m, 440 Internet: allbery@NCoast.ORG Packet: KB8JRR @ WA8BXN America OnLine: KB8JRR AMPR: KB8JRR.AmPR.ORG [44.70.4.88] uunet!usenet.ins.cwru.edu!ncoast!allbery Delphi: ALLBERY
martin@mwtech.UUCP (Martin Weitzel) (01/14/91)
In article <1991Jan11.035416.18772@NCoast.ORG> allbery@ncoast.ORG (Brandon S. Allbery KB8JRR) writes: >As quoted from <1033@mwtech.UUCP> by martin@mwtech.UUCP (Martin Weitzel): >+--------------- >| But when thinking how to smoothen [the shell syntax by using] fewer rules, >| we often do not recognize all the consequences that this would have. >+--------------- > >There is one other problem. I daresay it would be possible to make Bourne >shell syntax a bit more "regular" by using a yacc grammar. THIS WON'T WORK! >At least, not without making the shell much less useful --- yacc (or other >parser generators) grammars are not designed for interaction. My observations differ a little here. It is true that using a parser generator like yacc sometimes makes less concious of the actual parsing algorithm that may have to look for the next token to decide which rule should be reduced (and hence which action should be executed). But you can also write yacc-able grammars that can be parsed without look ahead! (Actions are generally a bit more complex then - in most cases you have to build the parsing tree explicitly as data structur rather than simply depend on yyparse's value stack.) But the conclusion that parsers generator grammars are not designed for interaction is similar to the `goto-considered-harmful' discussion: You cannot say that C programs are generally less structured just because the language contains a `goto'-statement. It much depends on the typical usage of the `goto' throughout a program, whether the program looks structured or more like spaghetti-code. Of course, if C had no `goto' at all even those old-time BASIC-hackers were forced to look at other ways to do control-flow. In so far I see some truth in Brandon's statement: Parser generators make it easy to write grammars which do not fit well into an interactive environment. >In order to >do interaction *well*, the shell needs to be able to have at least some idea >of what is going on *without* having read an entire complex command (read >"if/while/for/case/etc."). I've tried writing a yacc grammar that does this >kind of thing in a graceful manner; I ended up using context-sensitive hacks, >which I dislike in otherwise simple parsers. Again, `context-sensitive hacks' are not a bad thing a priori (maybe they are if they are real `hacks', but I think Brandon meant that he fed back some information from the syntax analysis to the lexer). There are two different situations: Either you plan a completly new syntax for a new language. In this case I would not recommend the coupling between parser and scanner, because such a syntax becomes more difficult to learn for a user of this new language (things have different meanings in different contexts). On the other hand, if you need to parse a given language that the user allready knows (e.g. some natural language or a sub-language thereof), feedback from syntax analysis to lexical analysis will help much, as long as it duplicates what the user allready expects. Finding a yacc-able syntax for the Bourne-Shell is a mixed case: A long-time shell-user would expect all the things in it that a newcomer might consider to be irregularities. (I don't dare to decide which are really irregularities as I belong rather to the former group, but at least I know that most of the irregularities - e.g. implied double quotes around the word after an `=' in an assignment and between `case-in' - help to save some key-strokes, though they really are very non-intuitive for newcomers.) >This is also why csh is not >actually like C --- C can depend on the parser collecting statements for it, >but csh is primarily designed for interactive use and therefore must be able >to keep track of what's going on incrementally. Here I can second Brandon's statement and will even work it out a bit more: One of the major problems come up if the syntax allows an if-statement with an optional else-part, as this is the case in C (but not in the Bourne Shell, as it has the closing `fi'). The user expects (of course) that the if-part should be executed after it is completly written down. But the parsing algorithm may want to look if there follows an `else'. This is because the user "knows" what he or she will do next but the Shell can not read the user's mind. That sort of things must be taken care of during the design of an interactive language. Simply adopting the syntax of a non-interactive language for an interactive language is bound to fail here. To summarize: IMHO it are not the parser generators which complicate things, but inappropriate design of an interactive language. (Esp. to Brandon: Do your experiences stem from trying to derive a yacc-able grammar for the Bourne-Shell or rather for the C-Shell?) BTW: I've redirected followups to comp.lang.misc, since the topic tends to turn away from the focus of comp.unix.shell. -- Martin Weitzel, email: martin@mwtech.UUCP, voice: 49-(0)6151-6 56 83