harald.alvestrand@elab-runit.sintef.no (11/28/90)
I have thought a long time about this, but I am sure there must be some magic way to do it: How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? That is, how do I get a balanced matching of parentheses? (or quotes, for that matter...) Harald Tveit Alvestrand Harald.Alvestrand@elab-runit.sintef.no C=no;PRMD=uninett;O=sintef;OU=elab-runit;S=alvestrand;G=harald +47 7 59 70 94
merlyn@iwarp.intel.com (Randal Schwartz) (11/29/90)
In article <1990Nov28.150131.28981@ugle.unit.no>, harald.alvestrand@elab-runit writes: | I have thought a long time about this, but I am sure there must be some | magic way to do it: | How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? | That is, how do I get a balanced matching of parentheses? | (or quotes, for that matter...) Regular expressions aren't strong enough. You need to use some other method (like recursive descent). Just another Perl hacker (also in the midst of hacking the final Book draft), -- /=Randal L. Schwartz, Stonehenge Consulting Services (503)777-0095 ==========\ | on contract to Intel's iWarp project, Beaverton, Oregon, USA, Sol III | | merlyn@iwarp.intel.com ...!any-MX-mailer-like-uunet!iwarp.intel.com!merlyn | \=Cute Quote: "Intel: putting the 'backward' in 'backward compatible'..."====/
poage@sunny.ucdavis.edu (Tom Poage) (11/29/90)
In article <1990Nov28.150131.28981@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: >I have thought a long time about this, but I am sure there must be some >magic way to do it: >How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? >That is, how do I get a balanced matching of parentheses? >(or quotes, for that matter...) > > > Harald Tveit Alvestrand >Harald.Alvestrand@elab-runit.sintef.no >C=no;PRMD=uninett;O=sintef;OU=elab-runit;S=alvestrand;G=harald >+47 7 59 70 94 How about #!/bin/perl for ('()','((()))','((())())','((())') { if (tr/(/(/ != tr/)/)/) { warn "unmatched parentheses '$_'\n"; } } Tom. -- Tom Poage, Clinical Engineering Universiy of California, Davis, Medical Center, Sacramento, CA poage@sunny.ucdavis.edu {...,ucbvax,uunet}!ucdavis!sunny!poage
mwm@raven.relay.pa.dec.com (Mike (My Watch Has Windows) Meyer) (11/29/90)
In article <582@sunny.ucdavis.edu> poage@sunny.ucdavis.edu (Tom Poage) writes: In article <1990Nov28.150131.28981@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: >I have thought a long time about this, but I am sure there must be some >magic way to do it: >How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? >That is, how do I get a balanced matching of parentheses? Mr. Poage's solution doesn't work - it thinks that )( is a pair of matching parens. I don't think raw count was what was intended. >(or quotes, for that matter...) This also depends on what you mean by "matching quotes". If all you want is for every quote character to have a match later, this does it: /^[^"]*("[^"]*"[^"]*)*$/ You can incorporate escaped quotes into that pattern (left as an exercise). If you want to correctly deal with two kinds of quotes, I think this does it, but haven't done any testing on it: /^[^"']*((("[^"]*")|('[^']*'))[^"']*)*$/ <mike --
tchrist@convex.COM (Tom Christiansen) (11/29/90)
In article <1990Nov28.150131.28981@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: >I have thought a long time about this, but I am sure there must be some >magic way to do it: >How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? >That is, how do I get a balanced matching of parentheses? >(or quotes, for that matter...) No, there's no "magic" way of doing it. The problem is that regular expressions are really nice, but they just can't handle recursive structures -- at least, not by themselves. To deal with these in Perl, you could interatively process /\(([^\)]*)\)/ to deal with things from the inside-out. Simple approaches like that one won't work well with escaped characters; you'll need to change them into something else (like \201) before the search, or else you'll match where you don't want to. --tom
cwitty@csli.Stanford.EDU (Carl Witty) (11/29/90)
In article <1990Nov28.150131.28981@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: I have thought a long time about this, but I am sure there must be some magic way to do it: How can I write a pattern that matches (), ((())), ((())()), but NOT ((())? That is, how do I get a balanced matching of parentheses? There is no pattern that will do it, but here's a program that will. #!/afs/ir/@sys/local/bin/perl while (<STDIN>) { chop; while (s/\(\)//) {} if ($_ eq "") { print "Balanced\n"; } else { print "Unbalanced\n"; } } Carl Witty cwitty@cs.stanford.edu
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/29/90)
In article <16600@csli.Stanford.EDU> cwitty@csli.Stanford.EDU (Carl Witty) writes: > In article <1990Nov28.150131.28981@ugle.unit.no> harald.alvestrand@elab-runit.sintef.no writes: [ how to detect balanced parentheses? ] [ 10-line Perl program ] #!/bin/sh sed ':1 s-()--g t1 s-..*-un- s-$-balanced-' ---Dan
flee@guardian.cs.psu.edu (Felix Lee) (11/29/90)
>That is, how do I get a balanced matching of parentheses?
This is a job for first-class pattern objects.
\balanced = *( '(' . \balanced . ')' | *[^()] );
print $string =~ \balanced;
Too bad they don't exist, yet.
--
Felix Lee flee@cs.psu.edu
des0mpw@colman.newcastle.ac.uk (M.P. Ward) (11/29/90)
The trick is to repetedly remove parantheses (and their contents) which don't contain parentheses. If the result has any parantheses left then the match fails. Another approach is to count +1 for (, -1 for ) and 0 for anything else. The count must never go negative and must finish at zero. Here's my perl code for checking {} balancing in LaTeX files: (I have a similar program to check \begin...\end matching) #! /usr/bin/perl # check begin/end pairs in latex file ($myname = $0) =~ s|(.*/)*||; # strip path component from name $Usage = "Usage: $myname [file]\n"; if ($#ARGV == 0) { $file = "< $ARGV[0]"; if (! -f $ARGV[0]) { die "File not found: $ARGV[0]\n"; } } elsif ($#ARGV == -1) { $file = ""; } else { die "$Usage"; } open (GOODS,"qtoa $file | egrep -n '{|}' |") || die "qtoa/egrep failed: $!"; undef $/; $* = 1; $_ = <GOODS>; # slurp in begin/end pairs close GOODS; # remove \{ and \} (which don't have to match): s/\\{//g; s/\\}//g; # repeatedly collapse {...} pairs while (s/{[^{}]*}//g) { 1; } # eliminate junk lines (with no { or }): while (s/^[^{}]*\n//g) { 1; } # print the result: print; Martin. JANET: Martin.Ward@uk.ac.durham Internet (eg US): Martin.Ward@DURHAM.AC.UK or if that fails: Martin.Ward%uk.ac.durham@nfsnet-relay.ac.uk or even: Martin.Ward%DURHAM.AC.UK@CUNYVM.CUNY.EDU BITNET: IN%"Martin.Ward@DURHAM.AC.UK" UUCP:...!mcvax!ukc!durham!Martin.Ward
tchrist@convex.COM (Tom Christiansen) (11/30/90)
In article <5209:Nov2909:58:4190@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > [ 7-line sed program deleted ] While it's always good to choose the right task, and we shouldn't forget our simpler tools, I don't think the original poster has yet had his question answered. What he wanted was not merely WHETHER an expression was balanced, but rather how to pull out a balanced expression from a variable in a Perl program for subsequent use in that program. Enclosed below you will find some code that seems to do this. There's a &pull_quotes() function that you want to feed two things: the string with quoted strings in it, and the two characters that you want to call quotes. My enclosed examples demo using ` and ', as well as ( and ). Multi-character quotes are left as an exercise for the reader. :-) The function returns a list of all the quoted strings, as you encounter them proceeding left to right. For example, &pull_quotes('value = (double)(U_L(value) & U_L(str_gnum(st[2])));', '()'); returns ( "double", "U_L(value) & U_L(str_gnum(st[2]))", "value", "str_gnum(st[2])", "st[2]" ); Several examples of varying complexity are included. I'll spare the net the 8k of expected output these produce. You should be able to massage this into whatever you want to do with these matched, quoted strings. I eagerly await Dan's translation of this into sed. :-) --tom #!/usr/local/bin/perl # # pull_quotes -- tom christiansen, tchrist@convex.com # ################################################################################ sub show_quotes { local($string, $qchars) = @_; local($i) = $[; local($_); local(@list) = &pull_quotes($string,$qchars); $qchars =~ s/(.)(.)/$1...$2/; print "Extracted ", 0+@list, " ", $qchars, " quote", (@list != 1)?'s':'', " from: <<", $_[0]. ">>\n"; for (@list) { print "Quote #$i is <<", $_, ">>\n"; $i++; } print "\n"; } ################################################################################ sub pull_quotes { # pull_quotes($string, $quotchars) => @quotestrings local($_, $qchars) = @_; local($qL, $qR); # left and right quote chars, like `' or () local($quote_level); # current quote level local($max_quote); # deepest we've gotten local($qstring); # tmp space for quote local(@quotes); # list of quotes to return local($d) = '\$'; # not sure why this must be here local($b) = '\\'; # nor this local(@done); # which quotes we've finished so far die "need two just quote chars" if length($qchars) != 2; $qL = substr($qchars, 0, 1); $qR = substr($qchars, 1, 1); s/\\(.)/"\201".ord($1)."\202"/eg; # protect from backslashes $max_quote = $quote_level = $[-1; while ( /[$qchars]/ ) { if ($& eq $qL) { do { ++$quote_level; } while $done[$quote_level]; s/$b$qL/\${QL${quote_level}}/; $max_quote = $quote_level if $max_quote < $quote_level; } elsif ($& eq $qR) { s/$b$qR/\${QR${quote_level}}/; $done[$quote_level]++; do { --$quote_level; } while $done[$quote_level]; } else { die "unexpected quot char: $&"; } } print "pre-re-interpolated string is $_\n" if $debug; for ($quote_level = $[; $quote_level <= $max_quote; $quote_level++) { ($qstring) = /${d}\{QL$quote_level\}([^\000]*)${d}\{QR$quote_level}/; $qstring =~ s/\${QL\d+\}/$qL/g; $qstring =~ s/\${QR\d+\}/$qR/g; $qstring =~ s/\201(\d+)\202/pack('C',$1)/eg; $quotes[$quote_level] = $qstring; } @quotes; } ################################################################################ ################################################################################ ################################################################################ ################################################################################ ################################################################################ # MAIN STARTS HERE &show_quotes("This has `one' quote", "`'"); &show_quotes(q(Like this: `Jim' and `Bill said, ``Boo'' to me' to the end.), "`'"); &show_quotes("Mom said, `Jim asked ``How come?'', so I told her ``because''.'", "`'"); &show_quotes(q(Mom said, `Don\\'t put Jimmy\\'s `widjet' down the drain!'), "`'"); &show_quotes('a = (b + c) / 2', '()'); &show_quotes('a = (b + (c*d)) / (7+(2/3))', '()'); &show_quotes('value = (double)(U_L(value) & U_L(str_gnum(st[2])));', '()'); &show_quotes(<<'EOQ', "()"); case 'I': str_cat(str,"-"); str_cat(str,s); str_cat(str," "); if (*++s) { (void)apush((((STBP*)(incstab->str_ptr))->stbp_array ? ((STBP*)(incstab->str_ptr))->stbp_array : ((STBP*)(aadd(incstab)->str_ptr))->stbp_array),str_make(s,0)); } else if (argv[1]) { (void)apush((((STBP*)(incstab->str_ptr))->stbp_array ? ((STBP*)(incstab->str_ptr))->stbp_array : ((STBP*)(aadd(incstab)->str_ptr))->stbp_array),str_make(argv[1],0)); str_cat(str,argv[1]); argc--,argv++; str_cat(str," "); } EOQ &show_quotes(<<'EOW', '()'); (defun define-abbrevs (&optional arg) "Define abbrevs according to current visible buffer contents. See documentation of edit-abbrevs for info on the format of the text you must have in the buffer. With argument, eliminate all abbrev definitions except the ones defined from the buffer now." (interactive "P") (if arg (kill-all-abbrevs)) (save-excursion (goto-char (point-min)) (while (and (not (eobp)) (re-search-forward "^\(" nil t)) (let* ((buf (current-buffer)) (table (read buf)) abbrevs) (forward-line 1) (while (progn (forward-line 1) (not (eolp))) (setq name (read buf) count (read buf) exp (read buf)) (skip-chars-backward " \t\n\f") (setq hook (if (not (eolp)) (read buf))) (skip-chars-backward " \t\n\f") (setq abbrevs (cons (list name exp hook count) abbrevs))) (define-abbrev-table table abbrevs))))) EOW exit 0;
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (11/30/90)
It occurs to me that the solutions posted so far will take forever on long strings. Here's a linear-time solution: #!/bin/sh ( echo '0'; fold -1 | sed 's,[^()],, s,(,1+p, s,),1-p,' ) | dc | ( grep -s -e - && echo 'unbalanced' || echo 'balanced' ) (Whaddya mean, this isn't comp.unix.shell? :-) ) ---Dan
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/01/90)
In article <12435:Nov3011:26:5990@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > #!/bin/sh > ( echo '0'; fold -1 | sed 's,[^()],, > s,(,1+p, > s,),1-p,' ) | dc | ( grep -s -e - && echo 'unbalanced' || echo 'balanced' ) Whoops, thanks to William Setzer for pointing out that this doesn't notice missing close parens. #!/bin/sh ( echo '0'; echo '0'; fold -1 | sed 's,[^()],, s,(,1+p, s,),1-p,' ; echo '-p' ) | dc | ( grep -s -e - && echo 'unbalanced' || echo 'balanced' ) ---Dan
ronald@robobar.co.uk (Ronald S H Khoo) (12/01/90)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > #!/bin/sh > ( echo '0'; echo '0'; fold -1 | sed 's,[^()],, What's fold ? Have you got an implementation of it in perl that you're willing to share ? -- ronald@robobar.co.uk +44 81 991 1142 (O) +44 71 229 7741 (H)
brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (12/02/90)
In article <1990Dec1.105725.28029@robobar.co.uk> ronald@robobar.co.uk (Ronald S H Khoo) writes: > brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes: > > #!/bin/sh > > ( echo '0'; echo '0'; fold -1 | sed 's,[^()],, > What's fold ? A program that wraps its input to a specific line length. fold -1 just puts a newline after each character. This is somewhat off the subject, though; I was just illustrating one way to code the parenthesis-counting technique. ---Dan