[comp.lang.c] string compare with wildcards

claus@iesd.auc.dk (Claus S. Jensen) (12/13/90)

   Does anybody out there have a routine that compares
two strings, one of which contains wildcards (*,?).
If so, I would appreciate to obtain a copy.

     Thanks in advance,
         Claus S.Jensen
--
########################################################################
#  Claus S.Jensen , claus@iesd.auc.dk # Superman on Deep Trouble MUD   #
#  University of Aalborg, Denmark     # krikand.iesd.auc.dk 2000       #
########################################################################

staff@cadlab.sublink.ORG (Alex Martelli) (12/14/90)

claus@iesd.auc.dk (Claus S. Jensen) writes:
	...
>   Does anybody out there have a routine that compares
>two strings, one of which contains wildcards (*,?).

It's real easy, thanks to recursion (no "stylistic" flames, please!-):

int wildmatch(pat, str) char *pat, *str; {
    int pat_ch, str_ch;
    while(pat_ch = *pat++) {
        if(pat_ch == '*') {
            if(*pat == 0) return 1;
            for(;;) {
                if(wildmatch(pat, str)) return 1;
                if((str_ch = *str++)==0) return 0;
            }
        } else {
            if((str_ch = *str++)==0) return 0;
            if(pat_ch != '?' && str_ch != pat_ch) return 0;
        }
    }
    return *str == 0;
}

Optimizing this, and in particular eliminating the recursive call,
is, on the other hand, a rather interesting exercise, but one I'd
undertake only if wildcard-matching was proven-by-profiling to be
a bottleneck in my application... which is rather unlikely, I think.
-- 
Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 45, Bologna, Italia
Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org
Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; 
Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).

thorinn@rimfaxe.diku.dk (Lars Henrik Mathiesen) (12/19/90)

staff@cadlab.sublink.ORG (Alex Martelli) writes:
>claus@iesd.auc.dk (Claus S. Jensen) writes:
>>   Does anybody out there have a routine that compares
>>two strings, one of which contains wildcards (*,?).

>It's real easy, thanks to recursion (no "stylistic" flames, please!-):

>int wildmatch(pat, str) char *pat, *str; {
>	. . .

>Optimizing this, and in particular eliminating the recursive call,
>is, on the other hand, a rather interesting exercise, but one I'd
>undertake only if wildcard-matching was proven-by-profiling to be
>a bottleneck in my application... which is rather unlikely, I think.

There's nothing wrong with recursive calls. The problem with most
implementations of this is nested _loops_, and it doesn't matter if
they occur in recursive instances of a function, or are simulated with
a stack. Try "echo ********.c" in Bourne shell for a demonstration. X
users may have noticed the problem using font specifications with too
many asterisks.

Eliminating these nested loops changes the worst-case time complexity
from O(n^(m-1)) to O(n*m), where n and m are the lengths of the string
and the pattern, respectively.

The point is that once the part of a pattern between two asterisks
matches at some position in the string, there is no need to retry it
at any later position: The rest of the pattern (after the second
asterisk) will already have been tried (and have failed) at every
position after the first match, so it will only fail again.

Implementation hint: Let wildmatch return one of SUCCESS, RETRY or
FAIL. FAIL is returned when the string is exhausted before the
pattern, RETRY when it otherwise doesn't match the pattern. Both
SUCCESS and FAIL from the recursive call are returned to the caller
immediately, and the loop runs only on RETRY. (Once the loop is
entered, the function will only return SUCCESS or FAIL.) Once a
recursive version is finished, you can easily construct an iterative
one (Hint: you only need to save one (pattern,string) position pair).

--
Lars Mathiesen, DIKU, U of Copenhagen, Denmark      [uunet!]mcsun!diku!thorinn
Institute of Datalogy -- we're scientists, not engineers.      thorinn@diku.dk