oz@yunexus.UUCP (Ozan Yigit) (09/18/88)
[Apologies to those getting tired of this topic.] In article <8209@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes: > >it sounds appealing to allow a missing RE to mean the empty string > but i am unconvinced as to its utility. > With all due respect, the argument of "utility" except in the "specific" case of '|foo' (as used by Rick@seismo) is suspect (bogus?). Unless I am mistaken in the equivalence of (foo)? and (foo|E), the issue reduces to one of expression syntax vs semantics. Is there a good syntactic reason not to allow (foo|) as a valid expression, such as grammar ambiguity ?? If NOT, I would claim that the parsers rejecting the expression are "incomplete" (some would say broken :-), regardless of whether it is in "sam" (Gwyn special, Argumentum Ad Sam) or wherever. I agree that "blah(foo||bar)gasp" may not look quite as interesting (arguably) as "blah(foo|bar)+ptui", but if they are equivalent (yeah, I know, gasp is not equivalent to ptui. :-) and if there is no solid syntactic reason to allow one and disallow other, then, why bother to come up with excuses for it ?? Any thoughts, and/or some real reason against (foo|) ?? oz -- Crud that is not paged | Usenet: ...!utzoo!yunexus!oz is still crud. | ...uunet!mnetor!yunexus!oz andrew@alice | Bitnet: oz@[yulibra|yuyetti] | Phonet: +1 416 736-5257x3976
henry@utzoo.uucp (Henry Spencer) (09/20/88)
In article <857@yunexus.UUCP> oz@yunexus.UUCP (Ozan Yigit) writes: >Any thoughts, and/or some real reason against (foo|) ?? Well, personally, I'd dearly love to be able to use (| and |) as metasymbols, since (a) one highly desirable extension to my regexp package would be the beginning/end-of-identifier metasymbols found in many implementations, (b) I am deeply opposed to declaring more unbackslashed characters to be metasymbols, and (c) I am even more deeply opposed to declaring *any* backslashed characters to be metasymbols. There are other possibilities, exploiting sequences that are syntax errors at the moment, but none of them is nearly as pretty. (Not a trivial issue, given that users have to remember whatever sequence gets chosen.) Alas, I am also sympathetic to the argument that (1) it would be an unfortunate inconsistency, and (2) programs that generate regexps might have to go out of their way to avoid generating these magic sequences. Argh. Any thoughts? -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
ok@quintus.uucp (Richard A. O'Keefe) (09/21/88)
In article <1988Sep20.043728.20198@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: >Well, personally, I'd dearly love to be able to use (| and |) as metasymbols, Why not use (* ... ) as the meta-construct? >(2) programs that generate regexps might have to go out of their way to >avoid generating these magic sequences. Argh. Any thoughts? I suggest that there ought to be a way for programs to generate R.E.s *without* using magic sequences. How about having a program do e.g. begin_re(); /* "/" */ literal("foo"); /* "foo" */ begin_alternatives(); /* "(" */ literal("baz"); /* "baz" */ next_alternative(); /* "|" */ end_alternatives(); /* ")" */ literal(".c"); /* "\.c" */ pattern = end_re(); /* "/" */ to obtain a pattern equivalent to Csh's foo{baz,}.c It is *already* the case that programs which generate patterns have to go out of their way to avoid far too many magic sequences; a library like this would eliminate the problem at the source.
weemba@garnet.berkeley.edu (Obnoxious Math Grad Student) (09/21/88)
In article <1988Sep20.043728.20198@utzoo.uucp>, henry@utzoo (Henry Spencer) writes: > Alas, I am also sympathetic >to the argument that (1) it would be an unfortunate inconsistency, and >(2) programs that generate regexps might have to go out of their way to >avoid generating these magic sequences. Argh. Any thoughts? From a theoretician's point of view, these are the only arguments. I ran into null regexps in Gnews, when I generalized from KILLing based on newsgroup names to KILLing based on newsgroup regexps. I was so pleased when I realized that the null regexp would match all newsgroup names, and thus provide for global KILLs. It never occurred to me that there might be regexp handlers that would not take this: it's plain unnatural. ucbvax!garnet!weemba Matthew P Wiener/Brahms Gang/Berkeley CA 94720
henry@utzoo.uucp (Henry Spencer) (09/23/88)
In article <454@quintus.UUCP> ok@quintus.UUCP (Richard A. O'Keefe) writes: >Why not use (* ... ) as the meta-construct? The trouble is that the word brackets aren't always used together, so the trailing bracket needs to be distinguishable by itself. (* is attractive, but it has no obvious counterpart to be the closing bracket. >It is *already* the case that programs which generate patterns have to >go out of their way to avoid far too many magic sequences; a library like >this would eliminate the problem at the source. Actually, with my regexp package it suffices to backslash all the ordinary characters. A bit crude, but it works. This is one of the reasons why I am very reluctant to assign special meaning to any backslashed characters. -- NASA is into artificial | Henry Spencer at U of Toronto Zoology stupidity. - Jerry Pournelle | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
rroot@edm.UUCP (Stephen Samuel) (09/23/88)
From article <857@yunexus.UUCP>, by oz@yunexus.UUCP (Ozan Yigit): > [Apologies to those getting tired of this topic.] > In article <8209@alice.UUCP> andrew@alice.UUCP (Andrew Hume) writes: >> >it sounds appealing to allow a missing RE to mean the empty string >> but i am unconvinced as to its utility. > I agree that "blah(foo||bar)gasp" may not look quite as interesting > (arguably) as "blah(foo|bar)+ptui", but if they are equivalent (yeah, > I know, gasp is not equivalent to ptui. :-) and if there is no solid > syntactic reason to allow one and disallow other, then, why bother > to come up with excuses for it ?? I am inclined to say that it might be worthwile to allow it for the purpose of completeness. If you have something that does string replacements, then there IS a real difference between: // , /foo|/ and /foo/ especially if they are prefixed by something else: for example, you might want to do something like: change: /go\(ing|one|\) / = /went/ and if you were using grep to search for things like that, it would be nice to be able to be able to use pieces of your other expressions in a 'grep' search, even if it does look like a null event sometimes. -- ------------- Stephen Samuel Disclaimer: You betcha! {ihnp4,ubc-vision,seismo!mnetor,vax135}!alberta!edm!steve BITNET: USERZXCV@UQV-MTS
ka@june.cs.washington.edu (Kenneth Almquist) (09/27/88)
henry@utzoo.uucp (Henry Spencer) writes: > Well, personally, I'd dearly love to be able to use (| and |) as metasymbols, > since (a) one highly desirable extension to my regexp package would be the > beginning/end-of-identifier metasymbols found in many implementations, > (b) I am deeply opposed to declaring more unbackslashed characters to be > metasymbols, and (c) I am even more deeply opposed to declaring *any* > backslashed characters to be metasymbols. There are other possibilities, > exploiting sequences that are syntax errors at the moment, but none of > them is nearly as pretty. (Not a trivial issue, given that users have to > remember whatever sequence gets chosen.) Alas, I am also sympathetic > to the argument that (1) it would be an unfortunate inconsistency, and > (2) programs that generate regexps might have to go out of their way to > avoid generating these magic sequences. Argh. Any thoughts? My solution (when I faced this problem a long time ago) was to make an asterisk at the start of a regular expression require that the string matched not be preceded or followed by an character which can appear in a word. The arguments pro and con seem to be: 1) Word beginning and ending patterns are more flexible. Can anyone come up with a use for this flexibility? I can't. 2) The asterisk convention is easier to type. 3) The asterisk convention is easy to explain to a beginner on an intuitive level ("Place an asterisk in front of the expression to search for a word"), although a complete explanation of the semantics is about as complicated for either convention. 4) Even after the user learns the word begin and end commands, the user still has to type two commands to get a word search, which increases the cognitive complexity compared to typing one command to get a word search. 5) Neither syntax is intuitively obvious, but (| and |) do have intuitively obvious interpretations (both consist of a parethises and a '|' operator) which differ from the interpretation that Henry suggests for them. The basic problem with the word beginning and ending patterns is that they are at the wrong level. If they are *only* used as building blocks to build word searches, then a higher level feature like the asterisk convention which allows users to request word searches directly is a better choice. And they are too high level to be used for much else besides constructing word searches. The rare cases where they are used for something else (if such cases exist) can be handled by lower level features from which word beginning and ending patterns can be constructed. I expect that Henry's regexp package (like egrep) already has the required features. In conclusion, I believe that including the (| and |) operators in a regular expression package is a poor idea on two grounds. The semantics are wrong; if word searches are desired there are better ways to provide them, such as the asterisk convention. And (| and |) are a lousy choice of operators, for reasons which Henry notes in his article, while the asterisk convention has no such problems. Kenneth Almquist