tdinger@hiredgun.East.Sun.COM (Tom Dinger - Sun BOS SPA) (03/23/91)
I just sent this off to Larry, but since a number of people were reporting problems with op/regexp.t test 124, I thought I'd post it as well. TD -------------------------------------------------------------- Fix for patterns with more than 9 nested parentheses. For Perl 4.000 23 March 1991 Tom Dinger (tdinger@East.Sun.COM) Perl 4.000 adds the ability to nest pattern-matching parentheses deeper than 9 levels, and to refer to those substrings both with $1, ... after the match, and with backreferences \1, \2, ... \10, \11, ... in the pattern itself. An example of this is test 124 of op/regexp.t, which uses the pattern: ((((((((((a))))))))))\10 to match the string "aa" -- after the match, ($& eq "aa") should be TRUE. This is broken, although some platforms may not see the problem with this case (for example, it works on a Sun386i). If the pattern is nested more deeply, most if not all platforms should see the problem. The bug is in regcomp.c: the two arrays regmystartp[] and regmyendp[] are statically allocated, with 10 elements. The 0 index is for the entire matched substring, while indices 1..9 were for \1, ..., \9. However, for 10 or more nested parentheses, these are too small. The fix is to allocate theses arrays on entry to regexec(), big enough for all the parentheses in the regular expression. A minor speed-up is to allocate these two arrays initially with 10 elements, which should handle all old code without additional memory allocations, and if patterns are found that need more elements, to reallocate them larger; never free them. That is the method used in this patch. Apply using the patch program ("patch -p -N < thisfile") *** orig/regexec.c Sat Mar 23 00:48:27 1991 --- regexec.c Sat Mar 23 01:12:17 1991 *************** *** 80,87 **** static char *reglastparen; /* Similarly for lastparen. */ static char *regtill; ! static char *regmystartp[10]; /* For remembering backreferences. */ ! static char *regmyendp[10]; /* * Forwards. --- 80,88 ---- static char *reglastparen; /* Similarly for lastparen. */ static char *regtill; ! static int regmyp_size = 0; ! static char **regmystartp = Null(char**); ! static char **regmyendp = Null(char**); /* * Forwards. *************** *** 188,193 **** --- 189,212 ---- /* see how far we have to get to not match where we matched before */ regtill = string+minend; + + /* Allocate our backreference arrays */ + if ( regmyp_size < prog->nparens + 1 ) { + /* Allocate or enlarge the arrays */ + regmyp_size = prog->nparens + 1; + if ( regmyp_size < 10 ) regmyp_size = 10; /* minimum */ + if ( regmystartp ) { + /* reallocate larger */ + Renew(regmystartp,regmyp_size,char*); + Renew(regmyendp, regmyp_size,char*); + } + else { + /* Initial allocation */ + New(1102,regmystartp,regmyp_size,char*); + New(1102,regmyendp, regmyp_size,char*); + } + + } /* Simplest case: anchored match need be tried only once. */ /* [unless multiline is set] */ Tom Dinger consulting at: TechnoLogics, Inc. Sun Microsystems Internet: tdinger@East.Sun.COM (508)486-8500 (508)671-0521 UUCP: ...!sun!suneast!tdinger