[comp.lang.c] pattern/wild card matching

benyukhi@motcid.UUCP (Ed Benyukhis) (02/12/91)

Can anyone on the net e-mail or post the sources or pointers to
where I can find such for the pattern/wild card matching routines
in "C".  I understand that at one point or another there were several 
postings regarding pattern/wild card matching.  If anyone can e-mail
or post those with the corresponding source fragments, I would
very much appreciate it.

Thank you,

Ed Benyukhis, Motorola CIG
(708)632-6624

dave@cs.arizona.edu (Dave P. Schaumann) (02/12/91)

In article <6467@saffron1.UUCP> benyukhi@motcid.UUCP (Ed Benyukhis) writes:
>Can anyone on the net e-mail or post the sources or pointers to
>where I can find such for the pattern/wild card matching routines
>in "C".  [...]

You might look at the Gnu version of grep.  Also, if you're interested in
a good theoretical discussion, check out chapter 3 of _Compilers: Principles,
Techniques and Tools_ by Aho, Sethi & Ullman (aka the Red Dragon Book).

They give both deterministic and non-deterministic methods of wildcard
matching.






-- 
Dave Schaumann      | DANGER: Access holes may tear easily.  Use of the access
		    | holes for lifting or carrying may result in damage to the
dave@cs.arizona.edu | carton and subsequent injury to the user.

bliss@sp64.csrd.uiuc.edu (Brian Bliss) (02/13/91)

 re_comp() and re_exec() are the regular-expression pattern
 matching routines built into libc. say "man regex" for more info.

bb

karl@ima.isc.com (Karl Heuer) (02/13/91)

In answer to the question:
>where I can find such for the pattern/wild card matching routines

Someone suggested looking at GNU grep; someone else mentioned the routine
re_comp() (without observing that it's BSD-specific).  But I think the
original question is about shell-style wildcard matching (aka globbing), not
RE matching.  On some systems, there's a gmatch() function in libgen.a; if you
don't have that, the enclosed source ought to be useful.

Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint
--------cut here--------
#include "bool.h"
/*
 * Wildcard matching routine by Karl Heuer.  Public Domain.
 *
 * Test whether string s is matched by pattern p.
 * Supports "?", "*", "[", each of which may be escaped with "\";
 * Character classes may use "!" for negation and "-" for range.
 * Not yet supported: internationalization; "\" inside brackets.
 */
bool wildmatch(char const *s, char const *p) {
    register char c;
    while ((c = *p++) != '\0') {
	if (c == '?') {
	    if (*s++ == '\0') return (NO);
	} else if (c == '[') {
	    register bool wantit = YES;
	    register bool seenit = NO;
	    if (*p == '!') {
		wantit = NO;
		++p;
	    }
	    c = *p++;
	    do {
		if (c == '\0') return (NO);
		if (*p == '-' && p[1] != '\0') {
		    if (*s >= c && *s <= p[1]) seenit = YES;
		    p += 2;
		} else {
		    if (c == *s) seenit = YES;
		}
	    } while ((c = *p++) != ']');
	    if (wantit != seenit) return (NO);
	    ++s;
	} else if (c == '*') {
	    if (*p == '\0') return (YES); /* optimize common case */
	    do {
		if (wildmatch(s, p)) return (YES);
	    } while (*s++ != '\0');
	    return (NO);
	} else if (c == '\\') {
	    if (*p == '\0' || *p++ != *s++) return (NO);
	} else {
	    if (c != *s++) return (NO);
	}
    }
    return (*s == '\0');
}

gordon@osiris.cso.uiuc.edu (John Gordon) (02/14/91)

	I recently purchased the source code diskette for an issue of
_The C Gazette_ which contains, in essence, a "grep" function.  If
anyone is interested, I will post it.

	P.S.: Yes, it is legal for me to do this, as long as I leave
the author's name in the code.


---
John Gordon
Internet: gordon@osiris.cso.uiuc.edu        #include <disclaimer.h>
          gordon@cerl.cecer.army.mil       #include <clever_saying.h>

dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (02/15/91)

All the code referred to or posted seems to pay no special attention to
the "/" character, which in a UNIX environment has special meaning.  It
would be really nice if code were available that would do the following
things:

1.   Either require "/" to be explicitly matched or allow it to be
matched by wildcards.  For example, * would match any character
sequence except slash, but ** would match any character sequence.

2.   Allow the C-shell brace notation for grouping, i.e., {a,b,c}d in a
pattern would match any of ad, bd, and cd in the filename.

I found Karl Heuer's posted code very useful, but it would be even
nicer if somebody had a canned routine that includes the above
features.
--
Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com>
UUCP:  oliveb!cirrusl!dhesi

levine@szebra.com (Ron Levine) (02/15/91)

In article <844@caslon.cs.arizona.edu> dave@cs.arizona.edu (Dave P. Schaumann) writes:
>In article <6467@saffron1.UUCP> benyukhi@motcid.UUCP (Ed Benyukhis) writes:
>>Can anyone on the net e-mail or post the sources or pointers to
>>where I can find such for the pattern/wild card matching routines
>>in "C".  [...]
>
>You might look at the Gnu version of grep.  Also, if you're interested in
>a good theoretical discussion, check out chapter 3 of _Compilers: Principles,
>Techniques and Tools_ by Aho, Sethi & Ullman (aka the Red Dragon Book).
>
>They give both deterministic and non-deterministic methods of wildcard
>matching.
>
>
>
>
>
>
>-- 
>Dave Schaumann      | DANGER: Access holes may tear easily.  Use of the access
>		    | holes for lifting or carrying may result in damage to the
>dave@cs.arizona.edu | carton and subsequent injury to the user.

Source code routines for regular expressions and a mini-grep were published
in C-Gazette within the last 6 months.  You can contact the magazine
operating people or use their source code BBS to download (408-2410164)

ron levine

ttobler@unislc.uucp (Trent Tobler) (02/18/91)

From article <2953@cirrusl.UUCP>, by dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi):
> All the code referred to or posted seems to pay no special attention to
> the "/" character, which in a UNIX environment has special meaning.  It
> would be really nice if code were available that would do the following
> things:
> 
> 1.   Either require "/" to be explicitly matched or allow it to be
> matched by wildcards.  For example, * would match any character
> sequence except slash, but ** would match any character sequence.
> 
> 2.   Allow the C-shell brace notation for grouping, i.e., {a,b,c}d in a
> pattern would match any of ad, bd, and cd in the filename.
> 
> I found Karl Heuer's posted code very useful, but it would be even
> nicer if somebody had a canned routine that includes the above
> features.


If the code he posted follows grep style matching, all of the above is
possible using the '[ ... ]' construct.  For example, to match any character
except a "/", use "[^/]", ie..  "m[^/]*/abc" will match "me/abc", "mirth/abc",
etc.

In number 2, instead of "{a,b,c}d", use "[abc]d".  Of course, one drawback to
this is that grep doesn't allow alternate strings of characters (at least as
far as I know.)  For example, the syntax I have seen used is
"I will( | not )sleep"  should match either "I will sleep", or
"I will not sleep".


--

    Trent Tobler  - ttobler@csulx.weber.edu