tdinger@hiredgun.East.Sun.COM (Tom Dinger - Sun BOS SPA) (03/23/91)
I just sent this off to Larry, but since a number of people were reporting
problems with op/regexp.t test 124, I thought I'd post it as well.
TD
--------------------------------------------------------------
Fix for patterns with more than 9 nested parentheses.
For Perl 4.000
23 March 1991
Tom Dinger (tdinger@East.Sun.COM)
Perl 4.000 adds the ability to nest pattern-matching parentheses
deeper than 9 levels, and to refer to those substrings both with
$1, ... after the match, and with backreferences \1, \2, ... \10,
\11, ... in the pattern itself.
An example of this is test 124 of op/regexp.t, which uses the pattern:
((((((((((a))))))))))\10
to match the string "aa" -- after the match, ($& eq "aa") should be
TRUE.
This is broken, although some platforms may not see the problem with
this case (for example, it works on a Sun386i). If the pattern
is nested more deeply, most if not all platforms should see the
problem.
The bug is in regcomp.c: the two arrays regmystartp[] and regmyendp[]
are statically allocated, with 10 elements. The 0 index is for the
entire matched substring, while indices 1..9 were for \1, ..., \9.
However, for 10 or more nested parentheses, these are too small.
The fix is to allocate theses arrays on entry to regexec(), big
enough for all the parentheses in the regular expression.
A minor speed-up is to allocate these two arrays initially with 10
elements, which should handle all old code without additional memory
allocations, and if patterns are found that need more elements, to
reallocate them larger; never free them. That is the method used in
this patch.
Apply using the patch program ("patch -p -N < thisfile")
*** orig/regexec.c Sat Mar 23 00:48:27 1991
--- regexec.c Sat Mar 23 01:12:17 1991
***************
*** 80,87 ****
static char *reglastparen; /* Similarly for lastparen. */
static char *regtill;
! static char *regmystartp[10]; /* For remembering backreferences. */
! static char *regmyendp[10];
/*
* Forwards.
--- 80,88 ----
static char *reglastparen; /* Similarly for lastparen. */
static char *regtill;
! static int regmyp_size = 0;
! static char **regmystartp = Null(char**);
! static char **regmyendp = Null(char**);
/*
* Forwards.
***************
*** 188,193 ****
--- 189,212 ----
/* see how far we have to get to not match where we matched before */
regtill = string+minend;
+
+ /* Allocate our backreference arrays */
+ if ( regmyp_size < prog->nparens + 1 ) {
+ /* Allocate or enlarge the arrays */
+ regmyp_size = prog->nparens + 1;
+ if ( regmyp_size < 10 ) regmyp_size = 10; /* minimum */
+ if ( regmystartp ) {
+ /* reallocate larger */
+ Renew(regmystartp,regmyp_size,char*);
+ Renew(regmyendp, regmyp_size,char*);
+ }
+ else {
+ /* Initial allocation */
+ New(1102,regmystartp,regmyp_size,char*);
+ New(1102,regmyendp, regmyp_size,char*);
+ }
+
+ }
/* Simplest case: anchored match need be tried only once. */
/* [unless multiline is set] */
Tom Dinger consulting at:
TechnoLogics, Inc. Sun Microsystems Internet: tdinger@East.Sun.COM
(508)486-8500 (508)671-0521 UUCP: ...!sun!suneast!tdinger