[net.unix-wizards] Bug in regex

rich@cfib.UUCP (02/28/85)

Has anybody had trouble with regcmp(3)/regex(3)?  I seem to have found a bug
in handling expressions with breakout strings - regex corrupts the subject
string pointed to by the second arg.  We are running Hewlett-Packard HP-UX
(System III based) on an HP 9000.  To check this out, compile and run the
following program (load with -lPW); give it these args:

    '(..)$0(..)$1(..)$2' '850215'

Our program output:

    psubj before regex = 30000000134 '850215'
    psubj after regex = 30000000134 '02'
    pregex = 30000000142 ''
    __loc1 = 30000000134 '02'
    re[0] = '85'

There is nothing in the man for regex that suggests that psubj should be
altered, and it is not returning the second and third breakout strings.  Am I
missing something?  If it is a bug, any fixes?

			Rich Baughman
			The Consumer Financial Institute, Newton, MA

			decvax!yale-co!ima!cfib!rich
			ucbvax!cbosgd!ima!cfib!rich
			{allegra|research|amd70}!ima!cfib!rich

/******************************************************************************/

main (ac,av)
    int  ac;
    char *av[];
{
    char  *preg, *pregcmp, *psubj, *pregex, *regcmp(), *regex();
    char  re[10][100];
    int  i;
    extern char  *__loc1;

    if (ac != 3)
    {
	printf ("?need 2 args - regular expression and subject\n");
	exit (1);
    }
    preg = *++av;
    if ((pregcmp=regcmp(preg,0)) == 0)
    {
	printf ("?regcmp: bad regular expression\n");
	exit (1);
    }

    psubj = *++av;
    printf ("psubj before regex = %o '%s'\n", psubj, psubj);
    pregex = regex (pregcmp, psubj, re[0], re[1], re[2], re[3], re[4], re[5],
	re[6], re[7], re[8], re[9], 0);
    if (pregex == 0)
    {
	printf ("?regex failed\n");
	exit (1);
    }

    printf ("psubj after regex = %o '%s'\n", psubj, psubj);
    printf ("pregex = %o '%s'\n", pregex, pregex);
    printf ("__loc1 = %o '%s'\n", __loc1, __loc1);
    for (i=0; i<10; ++i)
	if (re[i] != 0 && strlen(re[i]) > 0)
	    printf ("re[%d] = '%s'\n",i,re[i]);

}

guy@rlgvax.UUCP (Guy Harris) (02/28/85)

> Has anybody had trouble with regcmp(3)/regex(3)?  I seem to have found a bug
> in handling expressions with breakout strings - regex corrupts the subject
> string pointed to by the second arg.

I tried it on our VAX with the System V version of "regex", and it worked
OK.  However, I did discover an amusing shell bug by accident; I mistyped
"sh -x stuff.sh >>stuff" (a test script) as "sh .x stuff.sh >>stuff", and
got

	: : cannot open

and got ".x.x" at the end of "stuff".  Looks like it's confused about what
should be written to standard output and standard error...

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy

guy@rlgvax.UUCP (Guy Harris) (03/01/85)

> However, I did discover an amusing shell bug by accident; I mistyped
> "sh -x stuff.sh >>stuff" (a test script) as "sh .x stuff.sh >>stuff", and
> got
> 
> 	: : cannot open
> 
> and got ".x.x" at the end of "stuff".

False alarm.  Sorry.  That's not in the standard S5R2 shell, but it is in the
BRL-modified shell (with Berkeley job control) that I run.  Never mind.

	Guy Harris
	{seismo,ihnp4,allegra}!rlgvax!guy