benyukhi@motcid.UUCP (Ed Benyukhis) (02/12/91)
Can anyone on the net e-mail or post the sources or pointers to where I can find such for the pattern/wild card matching routines in "C". I understand that at one point or another there were several postings regarding pattern/wild card matching. If anyone can e-mail or post those with the corresponding source fragments, I would very much appreciate it. Thank you, Ed Benyukhis, Motorola CIG (708)632-6624
dave@cs.arizona.edu (Dave P. Schaumann) (02/12/91)
In article <6467@saffron1.UUCP> benyukhi@motcid.UUCP (Ed Benyukhis) writes: >Can anyone on the net e-mail or post the sources or pointers to >where I can find such for the pattern/wild card matching routines >in "C". [...] You might look at the Gnu version of grep. Also, if you're interested in a good theoretical discussion, check out chapter 3 of _Compilers: Principles, Techniques and Tools_ by Aho, Sethi & Ullman (aka the Red Dragon Book). They give both deterministic and non-deterministic methods of wildcard matching. -- Dave Schaumann | DANGER: Access holes may tear easily. Use of the access | holes for lifting or carrying may result in damage to the dave@cs.arizona.edu | carton and subsequent injury to the user.
bliss@sp64.csrd.uiuc.edu (Brian Bliss) (02/13/91)
re_comp() and re_exec() are the regular-expression pattern matching routines built into libc. say "man regex" for more info. bb
karl@ima.isc.com (Karl Heuer) (02/13/91)
In answer to the question: >where I can find such for the pattern/wild card matching routines Someone suggested looking at GNU grep; someone else mentioned the routine re_comp() (without observing that it's BSD-specific). But I think the original question is about shell-style wildcard matching (aka globbing), not RE matching. On some systems, there's a gmatch() function in libgen.a; if you don't have that, the enclosed source ought to be useful. Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint --------cut here-------- #include "bool.h" /* * Wildcard matching routine by Karl Heuer. Public Domain. * * Test whether string s is matched by pattern p. * Supports "?", "*", "[", each of which may be escaped with "\"; * Character classes may use "!" for negation and "-" for range. * Not yet supported: internationalization; "\" inside brackets. */ bool wildmatch(char const *s, char const *p) { register char c; while ((c = *p++) != '\0') { if (c == '?') { if (*s++ == '\0') return (NO); } else if (c == '[') { register bool wantit = YES; register bool seenit = NO; if (*p == '!') { wantit = NO; ++p; } c = *p++; do { if (c == '\0') return (NO); if (*p == '-' && p[1] != '\0') { if (*s >= c && *s <= p[1]) seenit = YES; p += 2; } else { if (c == *s) seenit = YES; } } while ((c = *p++) != ']'); if (wantit != seenit) return (NO); ++s; } else if (c == '*') { if (*p == '\0') return (YES); /* optimize common case */ do { if (wildmatch(s, p)) return (YES); } while (*s++ != '\0'); return (NO); } else if (c == '\\') { if (*p == '\0' || *p++ != *s++) return (NO); } else { if (c != *s++) return (NO); } } return (*s == '\0'); }
gordon@osiris.cso.uiuc.edu (John Gordon) (02/14/91)
I recently purchased the source code diskette for an issue of _The C Gazette_ which contains, in essence, a "grep" function. If anyone is interested, I will post it. P.S.: Yes, it is legal for me to do this, as long as I leave the author's name in the code. --- John Gordon Internet: gordon@osiris.cso.uiuc.edu #include <disclaimer.h> gordon@cerl.cecer.army.mil #include <clever_saying.h>
dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi) (02/15/91)
All the code referred to or posted seems to pay no special attention to the "/" character, which in a UNIX environment has special meaning. It would be really nice if code were available that would do the following things: 1. Either require "/" to be explicitly matched or allow it to be matched by wildcards. For example, * would match any character sequence except slash, but ** would match any character sequence. 2. Allow the C-shell brace notation for grouping, i.e., {a,b,c}d in a pattern would match any of ad, bd, and cd in the filename. I found Karl Heuer's posted code very useful, but it would be even nicer if somebody had a canned routine that includes the above features. -- Rahul Dhesi <dhesi%cirrusl@oliveb.ATC.olivetti.com> UUCP: oliveb!cirrusl!dhesi
levine@szebra.com (Ron Levine) (02/15/91)
In article <844@caslon.cs.arizona.edu> dave@cs.arizona.edu (Dave P. Schaumann) writes: >In article <6467@saffron1.UUCP> benyukhi@motcid.UUCP (Ed Benyukhis) writes: >>Can anyone on the net e-mail or post the sources or pointers to >>where I can find such for the pattern/wild card matching routines >>in "C". [...] > >You might look at the Gnu version of grep. Also, if you're interested in >a good theoretical discussion, check out chapter 3 of _Compilers: Principles, >Techniques and Tools_ by Aho, Sethi & Ullman (aka the Red Dragon Book). > >They give both deterministic and non-deterministic methods of wildcard >matching. > > > > > > >-- >Dave Schaumann | DANGER: Access holes may tear easily. Use of the access > | holes for lifting or carrying may result in damage to the >dave@cs.arizona.edu | carton and subsequent injury to the user. Source code routines for regular expressions and a mini-grep were published in C-Gazette within the last 6 months. You can contact the magazine operating people or use their source code BBS to download (408-2410164) ron levine
ttobler@unislc.uucp (Trent Tobler) (02/18/91)
From article <2953@cirrusl.UUCP>, by dhesi%cirrusl@oliveb.ATC.olivetti.com (Rahul Dhesi): > All the code referred to or posted seems to pay no special attention to > the "/" character, which in a UNIX environment has special meaning. It > would be really nice if code were available that would do the following > things: > > 1. Either require "/" to be explicitly matched or allow it to be > matched by wildcards. For example, * would match any character > sequence except slash, but ** would match any character sequence. > > 2. Allow the C-shell brace notation for grouping, i.e., {a,b,c}d in a > pattern would match any of ad, bd, and cd in the filename. > > I found Karl Heuer's posted code very useful, but it would be even > nicer if somebody had a canned routine that includes the above > features. If the code he posted follows grep style matching, all of the above is possible using the '[ ... ]' construct. For example, to match any character except a "/", use "[^/]", ie.. "m[^/]*/abc" will match "me/abc", "mirth/abc", etc. In number 2, instead of "{a,b,c}d", use "[abc]d". Of course, one drawback to this is that grep doesn't allow alternate strings of characters (at least as far as I know.) For example, the syntax I have seen used is "I will( | not )sleep" should match either "I will sleep", or "I will not sleep". -- Trent Tobler - ttobler@csulx.weber.edu