[alt.sources.d] wildmat BUG and Fix

bernie@DIALix.oz.au (Bernd Felsche) (01/04/91)

In <3154@litchi.bbn.com> rsalz@bbn.com (Rich Salz) writes:

>This is a revised version of my pattern-matching routine.  It has been
>posted to the net several times, and shows up in several places including
>Gilmore's public domain TAR.  You might want to test this, but I did
>and it seems solid.

It's neat, and arrived at our site on the very day it was needed.  :-)

It's not quite solid enough, though.  It won't match string "ab" 
to pattern "ab*" A trailing "*" never matches a trailing (null) string.

The BUG is:

[... start of shar deleted ...]
>Xstatic int
>XDoMatch(s, p)
>X    register char	*s;
>X    register char	*p;
>X{
>X    register int 	 last;
>X    register int 	 matched;
>X    register int 	 reverse;
>X
>X    for ( ; *p; s++, p++) {
>X	if (*s == '\0')
>X	    return ABORT;
[ ... rest of shar deleted ... ]

The FIX:
	return ABORT;
becomes:
	return *p == '*' && *++p == '\0' ? TRUE : ABORT;

This is not the way I usually write code, but it blends in with
the style of the rest of the code. :-) You probably won't be able to
tell that it's a patch.

What the package really needs, is a syntax parser (to check for
closing ], and for ][ inside character ranges) and a compressor
(for things like "**").  This would stop it rampaging through memory
until it hits an MMU boundary.

How does this code compare with "bash"?  Anybody familiar enough
with bash to excise the filename expansion stuff from that?
-- 
 ________Bernd_Felsche__________bernie@DIALix.oz.au_____________
[ Phone: +61 9 419 2297		19 Coleman Road			]
[ TZ:	 UTC-8			Calista, Western Australia 6167	]

thorinn@diku.dk (Lars Henrik Mathiesen) (01/05/91)

First off: Does anybody have a set of test data for shell globbing? It
would be nice to have a positive check on the updated wildmat.c .

bernie@DIALix.oz.au (Bernd Felsche) writes:
>In <3154@litchi.bbn.com> rsalz@bbn.com (Rich Salz) writes:
>>This is a revised version of my pattern-matching routine.

>It won't match string "ab" 
>to pattern "ab*" A trailing "*" never matches a trailing (null) string.

>The BUG is:

>>X	if (*s == '\0')
>>X	    return ABORT;

>The FIX:
>	return ABORT;
>becomes:
>	return *p == '*' && *++p == '\0' ? TRUE : ABORT;

I'd prefer this FIX:
    if (*s == '\0')
becomes:
    if (*s == '\0' && *p != '*')

It is more general; the first fix will fail to match two trailing "*"s
with a trailing null string.

>What the package really needs, is a syntax parser (to check for
>closing ], and for ][ inside character ranges)

That would be nice. To be more precise, it should check for '\0' and
']' after '-' in ranges, check for '\0' after '\\' outside them, and
either allow ']' as first character in a range (old way) or recognize
'\\' inside ranges (more code, but easier to use).

>						and a compressor
>(for things like "**").

That is one effect of the ABORT stuff: Effectively, only the last of
two "*"s cause a loop. (However, it didn't quite make it into rsalz'
posting, only the bug part :-( See my post in alt.sources.)

>How does this code compare with "bash"?  Anybody familiar enough
>with bash to excise the filename expansion stuff from that?

I just looked at it. It has two routines, much like DoMatch and Star;
it does most of the checks I listed above, and it compresses
consecutive '*'s and '?'s. But it doesn't have anything like ABORT, so
the example shown in wildmat.c will still be very bad for it.

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk