[comp.lang.perl] regcomp.c bug

tdinger@hiredgun.East.Sun.COM (Tom Dinger - Sun BOS SPA) (03/23/91)

I just sent this off to Larry, but since a number of people were reporting
problems with op/regexp.t test 124, I thought I'd post it as well.

TD
--------------------------------------------------------------
	Fix for patterns with more than 9 nested parentheses.
	For Perl 4.000
	23 March 1991
	Tom Dinger (tdinger@East.Sun.COM)

	Perl 4.000 adds the ability to nest pattern-matching parentheses
	deeper than 9 levels, and to refer to those substrings both with
	$1, ... after the match, and with backreferences \1, \2, ... \10,
	\11, ... in the pattern itself.

	An example of this is test 124 of op/regexp.t, which uses the pattern:
		((((((((((a))))))))))\10
	to match the string "aa" -- after the match, ($& eq "aa") should be
	TRUE.

	This is broken, although some platforms may not see the problem with
	this case (for example, it works on a Sun386i).  If the pattern
	is nested more deeply, most if not all platforms should see the
	problem.

	The bug is in regcomp.c: the two arrays regmystartp[] and regmyendp[]
	are statically allocated, with 10 elements.  The 0 index is for the
	entire matched substring, while indices 1..9 were for \1, ..., \9.
	However, for 10 or more nested parentheses, these are too small.

	The fix is to allocate theses arrays on entry to regexec(), big
	enough for all the parentheses in the regular expression.

	A minor speed-up is to allocate these two arrays initially with 10
	elements, which should handle all old code without additional memory
	allocations, and if patterns are found that need more elements, to
	reallocate them larger; never free them.  That is the method used in
	this patch.

	Apply using the patch program ("patch -p -N < thisfile")

*** orig/regexec.c	Sat Mar 23 00:48:27 1991
--- regexec.c	Sat Mar 23 01:12:17 1991
***************
*** 80,87 ****
  static char *reglastparen;	/* Similarly for lastparen. */
  static char *regtill;
  
! static char *regmystartp[10];	/* For remembering backreferences. */
! static char *regmyendp[10];
  
  /*
   * Forwards.
--- 80,88 ----
  static char *reglastparen;	/* Similarly for lastparen. */
  static char *regtill;
  
! static int regmyp_size = 0;
! static char **regmystartp = Null(char**);
! static char **regmyendp   = Null(char**);
  
  /*
   * Forwards.
***************
*** 188,193 ****
--- 189,212 ----
  
  	/* see how far we have to get to not match where we matched before */
  	regtill = string+minend;
+ 
+ 	/* Allocate our backreference arrays */
+ 	if ( regmyp_size < prog->nparens + 1 ) {
+ 	    /* Allocate or enlarge the arrays */
+ 	    regmyp_size = prog->nparens + 1;
+ 	    if ( regmyp_size < 10 ) regmyp_size = 10;	/* minimum */
+ 	    if ( regmystartp ) {
+ 		/* reallocate larger */
+ 		Renew(regmystartp,regmyp_size,char*);
+ 		Renew(regmyendp,  regmyp_size,char*);
+ 	    }
+ 	    else {
+ 		/* Initial allocation */
+ 		New(1102,regmystartp,regmyp_size,char*);
+ 		New(1102,regmyendp,  regmyp_size,char*);
+ 	    }
+ 	
+ 	}
  
  	/* Simplest case:  anchored match need be tried only once. */
  	/*  [unless multiline is set] */
Tom Dinger	     consulting at:
TechnoLogics, Inc.        Sun Microsystems    Internet: tdinger@East.Sun.COM
(508)486-8500             (508)671-0521       UUCP: ...!sun!suneast!tdinger