johnk@wrq.com (John Kercheval) (03/26/91)
Submitted-by: John Kercheval <johnk@wrq.com> Posting-number: Volume 17, Issue 69 Archive-name: regex/part01 Here is the shar archive of V1.10 of REGEX Globber. This is a *IX wildcard globber I butchered, hacked and cajoled together after seeing and hearing about and becoming disgusted with several similar routines which had one or more of the following attributes: slow, buggy, required large levels of recursion on matches, required grotesque levels of recursion on failing matches using '*', full of caveats about usability or copyrights. These routines are fairly well tested and reasonably fast. I have made an effort to fail on all bad patterns and to quickly determine failing '*' patterns. This parser will also do quite a bit of the '*' matching via quick linear loops versus the standard blind recursive descent. This parser has been submitted to profilers at various stages of development and has come through quite well. If the last millisecond is important to you then some time can be shaved by using stack allocated variables in place of many of the pointer follows (which may be done fairly often) found in regex_match and regex_match_after_star (ie *p, *t). No attempt is made to provide general [pat,pat] comparisons. The specific subcases supplied by these routines is [pat,text] which is sufficient for the large majority of cases (should you care). Since regex_match may return one of three different values depending upon the pattern and text I have made a simple shell for convenience (match()). Also included is an is_pattern routine to quickly check a potential pattern for regex special characters. I even placed this all in a header file for you lazy folks! Having said all that, here is my own reinvention of the wheel. Please enjoy it's use and I hope it is of some help to those with need .... jbk ---- #! /bin/sh # This is a shell archive. Remove anything before this line, then feed it # into a shell via "sh file" or similar. To overwrite existing files, # type "sh file -c". # The tool that generated this appeared in the comp.sources.unix newsgroup; # send mail to comp-sources-unix@uunet.uu.net if you want that tool. # Contents: match.! match.c match.doc match.h matchmak matchtst.bat # readme.doc # Wrapped by kent@sparky on Mon Mar 25 14:33:28 1991 PATH=/bin:/usr/bin:/usr/ucb ; export PATH echo If this archive is complete, you will see the following message: echo ' "shar: End of archive 1 (of 1)."' if test -f 'match.!' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'match.!'\" else echo shar: Extracting \"'match.!'\" \(950 characters\) sed "s/^X//" >'match.!' <<'END_OF_FILE' X X.............................................................................. X.. .. X. REGEX Globber (Wild Card Matching) . X. . X. A *IX SH style pattern matcher written in C . X. V1.10 Dedicated to the Public Domain . X. . X. March 12, 1991 . X. J. Kercheval . X. [72450,3702] -- johnk@wrq.com . X.. .. X.............................................................................. X END_OF_FILE if test 950 -ne `wc -c <'match.!'`; then echo shar: \"'match.!'\" unpacked with wrong size! fi # end of 'match.!' fi if test -f 'match.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'match.c'\" else echo shar: Extracting \"'match.c'\" \(16844 characters\) sed "s/^X//" >'match.c' <<'END_OF_FILE' X/* X EPSHeader X X File: match.c X Author: J. Kercheval X Created: Sat, 01/05/1991 22:21:49 X*/ X/* X EPSRevision History X X J. Kercheval Wed, 02/20/1991 22:29:01 Released to Public Domain X J. Kercheval Fri, 02/22/1991 15:29:01 fix '\' bugs (two :( of them) X J. Kercheval Sun, 03/10/1991 19:31:29 add error return to matche() X J. Kercheval Sun, 03/10/1991 20:11:11 add is_valid_pattern code X J. Kercheval Sun, 03/10/1991 20:37:11 beef up main() X J. Kercheval Tue, 03/12/1991 22:25:10 Released as V1.1 to Public Domain X*/ X X/* X Wildcard Pattern Matching X*/ X X X#include "match.h" X Xint matche_after_star (register char *pattern, register char *text); Xint fast_match_after_star (register char *pattern, register char *text); X X/*---------------------------------------------------------------------------- X* X* Return TRUE if PATTERN has any special wildcard characters X* X----------------------------------------------------------------------------*/ X XBOOLEAN is_pattern (char *p) X{ X while ( *p ) { X switch ( *p++ ) { X case '?': X case '*': X case '[': X case '\\': X return TRUE; X } X } X return FALSE; X} X X X/*---------------------------------------------------------------------------- X* X* Return TRUE if PATTERN has is a well formed regular expression according X* to the above syntax X* X* error_type is a return code based on the type of pattern error. Zero is X* returned in error_type if the pattern is a valid one. error_type return X* values are as follows: X* X* PATTERN_VALID - pattern is well formed X* PATTERN_ESC - pattern has invalid escape ('\' at end of pattern) X* PATTERN_RANGE - [..] construct has a no end range in a '-' pair (ie [a-]) X* PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g ) X* PATTERN_EMPTY - [..] construct is empty (ie []) X* X----------------------------------------------------------------------------*/ X XBOOLEAN is_valid_pattern (char *p, int *error_type) X{ X X /* init error_type */ X *error_type = PATTERN_VALID; X X /* loop through pattern to EOS */ X while( *p ) { X X /* determine pattern type */ X switch( *p ) { X X /* check literal escape, it cannot be at end of pattern */ X case '\\': X if( !*++p ) { X *error_type = PATTERN_ESC; X return FALSE; X } X p++; X break; X X /* the [..] construct must be well formed */ X case '[': X p++; X X /* if the next character is ']' then bad pattern */ X if ( *p == ']' ) { X *error_type = PATTERN_EMPTY; X return FALSE; X } X X /* if end of pattern here then bad pattern */ X if ( !*p ) { X *error_type = PATTERN_CLOSE; X return FALSE; X } X X /* loop to end of [..] construct */ X while( *p != ']' ) { X X /* check for literal escape */ X if( *p == '\\' ) { X p++; X X /* if end of pattern here then bad pattern */ X if ( !*p++ ) { X *error_type = PATTERN_ESC; X return FALSE; X } X } X else X p++; X X /* if end of pattern here then bad pattern */ X if ( !*p ) { X *error_type = PATTERN_CLOSE; X return FALSE; X } X X /* if this a range */ X if( *p == '-' ) { X X /* we must have an end of range */ X if ( !*++p || *p == ']' ) { X *error_type = PATTERN_RANGE; X return FALSE; X } X else { X X /* check for literal escape */ X if( *p == '\\' ) X p++; X X /* if end of pattern here then bad pattern */ X if ( !*p++ ) { X *error_type = PATTERN_ESC; X return FALSE; X } X } X } X } X break; X X /* all other characters are valid pattern elements */ X case '*': X case '?': X default: X p++; /* "normal" character */ X break; X } X } X X return TRUE; X} X X X/*---------------------------------------------------------------------------- X* X* Match the pattern PATTERN against the string TEXT; X* X* returns MATCH_VALID if pattern matches, or an errorcode as follows X* otherwise: X* X* MATCH_PATTERN - bad pattern X* MATCH_LITERAL - match failure on literal mismatch X* MATCH_RANGE - match failure on [..] construct X* MATCH_ABORT - premature end of text string X* MATCH_END - premature end of pattern string X* MATCH_VALID - valid match X* X* X* A match means the entire string TEXT is used up in matching. X* X* In the pattern string: X* `*' matches any sequence of characters (zero or more) X* `?' matches any character X* [SET] matches any character in the specified set, X* [!SET] or [^SET] matches any character not in the specified set. X* X* A set is composed of characters or ranges; a range looks like X* character hyphen character (as in 0-9 or A-Z). [0-9a-zA-Z_] is the X* minimal set of characters allowed in the [..] pattern construct. X* Other characters are allowed (ie. 8 bit characters) if your system X* will support them. X* X* To suppress the special syntactic significance of any of `[]*?!^-\', X* and match the character exactly, precede it with a `\'. X* X----------------------------------------------------------------------------*/ X Xint matche ( register char *p, register char *t ) X{ X register char range_start, range_end; /* start and end in range */ X X BOOLEAN invert; /* is this [..] or [!..] */ X BOOLEAN member_match; /* have I matched the [..] construct? */ X BOOLEAN loop; /* should I terminate? */ X X for ( ; *p; p++, t++ ) { X X /* if this is the end of the text then this is the end of the match */ X if (!*t) { X return ( *p == '*' && *++p == '\0' ) ? MATCH_VALID : MATCH_ABORT; X } X X /* determine and react to pattern type */ X switch ( *p ) { X X /* single any character match */ X case '?': X break; X X /* multiple any character match */ X case '*': X return matche_after_star (p, t); X X /* [..] construct, single member/exclusion character match */ X case '[': { X X /* move to beginning of range */ X p++; X X /* check if this is a member match or exclusion match */ X invert = FALSE; X if ( *p == '!' || *p == '^') { X invert = TRUE; X p++; X } X X /* if closing bracket here or at range start then we have a X malformed pattern */ X if ( *p == ']' ) { X return MATCH_PATTERN; X } X X member_match = FALSE; X loop = TRUE; X X while ( loop ) { X X /* if end of construct then loop is done */ X if (*p == ']') { X loop = FALSE; X continue; X } X X /* matching a '!', '^', '-', '\' or a ']' */ X if ( *p == '\\' ) { X range_start = range_end = *++p; X } X else { X range_start = range_end = *p; X } X X /* if end of pattern then bad pattern (Missing ']') */ X if (!*p) X return MATCH_PATTERN; X X /* check for range bar */ X if (*++p == '-') { X X /* get the range end */ X range_end = *++p; X X /* if end of pattern or construct then bad pattern */ X if (range_end == '\0' || range_end == ']') X return MATCH_PATTERN; X X /* special character range end */ X if (range_end == '\\') { X range_end = *++p; X X /* if end of text then we have a bad pattern */ X if (!range_end) X return MATCH_PATTERN; X } X X /* move just beyond this range */ X p++; X } X X /* if the text character is in range then match found. X make sure the range letters have the proper X relationship to one another before comparison */ X if ( range_start < range_end ) { X if (*t >= range_start && *t <= range_end) { X member_match = TRUE; X loop = FALSE; X } X } X else { X if (*t >= range_end && *t <= range_start) { X member_match = TRUE; X loop = FALSE; X } X } X } X X /* if there was a match in an exclusion set then no match */ X /* if there was no match in a member set then no match */ X if ((invert && member_match) || X !(invert || member_match)) X return MATCH_RANGE; X X /* if this is not an exclusion then skip the rest of the [...] X construct that already matched. */ X if (member_match) { X while (*p != ']') { X X /* bad pattern (Missing ']') */ X if (!*p) X return MATCH_PATTERN; X X /* skip exact match */ X if (*p == '\\') { X p++; X X /* if end of text then we have a bad pattern */ X if (!*p) X return MATCH_PATTERN; X } X X /* move to next pattern char */ X p++; X } X } X X break; X } X X /* next character is quoted and must match exactly */ X case '\\': X X /* move pattern pointer to quoted char and fall through */ X p++; X X /* if end of text then we have a bad pattern */ X if (!*p) X return MATCH_PATTERN; X X /* must match this character exactly */ X default: X if (*p != *t) X return MATCH_LITERAL; X } X } X X /* if end of text not reached then the pattern fails */ X if ( *t ) X return MATCH_END; X else X return MATCH_VALID; X} X X X/*---------------------------------------------------------------------------- X* X* recursively call matche() with final segment of PATTERN and of TEXT. X* X----------------------------------------------------------------------------*/ X Xint matche_after_star (register char *p, register char *t) X{ X register int match = 0; X register nextp; X X /* pass over existing ? and * in pattern */ X while ( *p == '?' || *p == '*' ) { X X /* take one char for each ? and + */ X if ( *p == '?' ) { X X /* if end of text then no match */ X if ( !*t++ ) { X return MATCH_ABORT; X } X } X X /* move to next char in pattern */ X p++; X } X X /* if end of pattern we have matched regardless of text left */ X if ( !*p ) { X return MATCH_VALID; X } X X /* get the next character to match which must be a literal or '[' */ X nextp = *p; X if ( nextp == '\\' ) { X nextp = p[1]; X X /* if end of text then we have a bad pattern */ X if (!nextp) X return MATCH_PATTERN; X } X X /* Continue until we run out of text or definite result seen */ X do { X X /* a precondition for matching is that the next character X in the pattern match the next character in the text or that X the next pattern char is the beginning of a range. Increment X text pointer as we go here */ X if ( nextp == *t || nextp == '[' ) { X match = matche(p, t); X } X X /* if the end of text is reached then no match */ X if ( !*t++ ) match = MATCH_ABORT; X X } while ( match != MATCH_VALID && X match != MATCH_ABORT && X match != MATCH_PATTERN); X X /* return result */ X return match; X} X X X/*---------------------------------------------------------------------------- X* X* match() is a shell to matche() to return only BOOLEAN values. X* X----------------------------------------------------------------------------*/ X XBOOLEAN match( char *p, char *t ) X{ X int error_type; X error_type = matche(p,t); X return (error_type == MATCH_VALID ) ? TRUE : FALSE; X} X X X#ifdef TEST X X /* X * This test main expects as first arg the pattern and as second arg X * the match string. Output is yaeh or nay on match. If nay on X * match then the error code is parsed and written. X */ X X #include <stdio.h> X X int main(int argc, char *argv[]) X { X int error; X int is_valid_error; X X if (argc != 3) { X printf("Usage: MATCH Pattern Text\n"); X } X else { X printf("Pattern: %s\n", argv[1]); X printf("Text : %s\n", argv[2]); X X if (!is_pattern(argv[1])) { X printf(" First Argument Is Not A Pattern\n"); X } X else { X error = matche(argv[1],argv[2]); X is_valid_pattern(argv[1],&is_valid_error); X X switch ( error ) { X case MATCH_VALID: X printf(" Match Successful"); X if (is_valid_error != PATTERN_VALID) X printf(" -- is_valid_pattern() is complaining\n"); X else X printf("\n"); X break; X case MATCH_LITERAL: X printf(" Match Failed on Literal\n"); X break; X case MATCH_RANGE: X printf(" Match Failed on [..]\n"); X break; X case MATCH_ABORT: X printf(" Match Failed on Early Text Termination\n"); X break; X case MATCH_END: X printf(" Match Failed on Early Pattern Termination\n"); X break; X case MATCH_PATTERN: X switch ( is_valid_error ) { X case PATTERN_VALID: X printf(" Internal Disagreement On Pattern\n"); X break; X case PATTERN_ESC: X printf(" Literal Escape at End of Pattern\n"); X break; X case PATTERN_RANGE: X printf(" No End of Range in [..] Construct\n"); X break; X case PATTERN_CLOSE: X printf(" [..] Construct is Open\n"); X break; X case PATTERN_EMPTY: X printf(" [..] Construct is Empty\n"); X break; X default: X printf(" Internal Error in is_valid_pattern()\n"); X } X break; X default: X printf(" Internal Error in matche()\n"); X break; X } X } X X } X return(0); X } X X#endif END_OF_FILE if test 16844 -ne `wc -c <'match.c'`; then echo shar: \"'match.c'\" unpacked with wrong size! fi # end of 'match.c' fi if test -f 'match.doc' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'match.doc'\" else echo shar: Extracting \"'match.doc'\" \(5288 characters\) sed "s/^X//" >'match.doc' <<'END_OF_FILE' X X REGEX Globber (Wild Card Matching) X X A *IX SH style pattern matcher written in C X V1.10 Dedicated to the Public Domain X X March 12, 1991 X J. Kercheval X [72450,3702] -- johnk@wrq.com X X X X X*IX SH style Regular Expressions X================================ X XThe *IX command SH is a working shell similar in feel to the MSDOS Xshell COMMAND.COM. In point of fact much of what we see in our Xfamiliar DOS PROMPT was gleaned from the early UNIX shells available Xfor many of machines the people involved in the computing arena had Xat the time of the development of DOS and it's much maligned Xprecursor CP/M (although the UNIX shells were and are much more Xflexible and powerful then those on the current flock of micro Xmachines). The designers of DOS and CP/M did some fairly strange Xthings with their command processor and OS. One of those things was Xto only selectively adopt the regular expressions allowed within the X*IX shells. Only '?' and '*' were allowed in filenames and even with Xthese the '*' was allowed only at the end of a pattern and in fact Xwhen used to specify the filename the '*' did not apply to extension. XThis gave rise to the all too common expression "*.*". X XREGEX Globber is a SH pattern matcher. This allows such Xspecifications as *75.zip or * (equivelant to *.* in DOS lingo). XExpressions such as [a-e]*t would fit the name "apple.crt" or X"catspaw.bat" or "elegant". This allows considerably wider Xflexibility in file specification, general parsing or any other Xcircumstance in which this type of pattern matching is wanted. X XA match would mean that the entire string TEXT is used up in matching Xthe PATTERN and conversely the matched TEXT uses up the entire XPATTERN. X XIn the specified pattern string: X `*' matches any sequence of characters (zero or more) X `?' matches any character X `\' suppresses syntactic significance of a special character X [SET] matches any character in the specified set, X [!SET] or [^SET] matches any character not in the specified set. X XA set is composed of characters or ranges; a range looks like X'character hyphen character' (as in 0-9 or A-Z). [0-9a-zA-Z_] is the Xminimal set of characters allowed in the [..] pattern construct. XOther characters are allowed (ie. 8 bit characters) if your system Xwill support them (it almost certainly will). X XTo suppress the special syntactic significance of any of `[]*?!^-\', Xand match the character exactly, precede it with a `\'. X XTo view several examples of good and bad patterns and text see the Xoutput of MATCHTST.BAT X X X XMATCH() and MATCHE() X==================== X XThe match module as written has two parsing routines, one is matche() Xand the other is match(). Since match() is a call to matche() which Xsimply has its output mapped to a BOOLEAN value (ie TRUE if pattern Xmatches or FALSE otherwise), I will concentrate my explanations here Xon matche(). X XThe purpose of matche() is to match a pattern against a string of Xtext (usually a file name or specification). The match routine has Xextensive pattern validity checking built into it as part of the Xparser and allows for a robust pattern match. X XThe parser gives an error code on return of type int. The error code Xwill be one of the the following defined values (defined in match.h): X X MATCH_PATTERN - bad pattern or misformed pattern X MATCH_LITERAL - match failed on character match (standard X character) X MATCH_RANGE - match failure on character range ([..] construct) X MATCH_ABORT - premature end of text string (pattern longer X than text string) X MATCH_END - premature end of pattern string (text longer X than pattern called for) X MATCH_VALID - valid match using pattern X XThe functions are declared as follows: X X BOOLEAN match (char *pattern, char *text); X X int matche(register char *pattern, register char *text); X X X XIS_VALID_PATTERN() and IS_PATTERN() X=================================== X XThere are two routines for determining properties of a pattern Xstring. The first, is_pattern(), is designed simply to determine if Xsome character exists within the text which is consistent with a SH Xregular expression (this function returns TRUE if so and FALSE if Xnot). The second, is_valid_pattern() is designed to check the Xvalidity of a given pattern string (TRUE return if valid, FALSE if Xnot). By 'validity', I mean well formed or syntactically correct. X XIn addition, is_valid_pattern() has as one of it's parameters a Xreturn code for determining the type of error found in the pattern if Xone exists. The error codes are as follows and defined in match.h: X X PATTERN_VALID - pattern is well formed X PATTERN_ESC - pattern has invalid literal escape ('\' at end of X pattern) X PATTERN_RANGE - [..] construct has a no end range in a '-' pair X (ie [a-]) X PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g ) X PATTERN_EMPTY - [..] construct is empty (ie []) X XThe functions are declared as follows: X X BOOLEAN is_valid_pattern (char *pattern, int *error_type); X X BOOLEAN is_pattern (char *pattern); END_OF_FILE if test 5288 -ne `wc -c <'match.doc'`; then echo shar: \"'match.doc'\" unpacked with wrong size! fi # end of 'match.doc' fi if test -f 'match.h' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'match.h'\" else echo shar: Extracting \"'match.h'\" \(3963 characters\) sed "s/^X//" >'match.h' <<'END_OF_FILE' X/* X EPSHeader X X File: match.h X Author: J. Kercheval X Created: Sat, 01/05/1991 22:27:18 X*/ X/* X EPSRevision History X X J. Kercheval Wed, 02/20/1991 22:28:37 Released to Public Domain X J. Kercheval Sun, 03/10/1991 18:02:56 add is_valid_pattern X J. Kercheval Sun, 03/10/1991 18:25:48 add error_type in is_valid_pattern X J. Kercheval Sun, 03/10/1991 18:47:47 error return from matche() X J. Kercheval Tue, 03/12/1991 22:24:49 Released as V1.1 to Public Domain X*/ X X/* X Wildcard Pattern Matching X*/ X X#ifndef BOOLEAN X# define BOOLEAN int X# define TRUE 1 X# define FALSE 0 X#endif X X/* match defines */ X#define MATCH_PATTERN 6 /* bad pattern */ X#define MATCH_LITERAL 5 /* match failure on literal match */ X#define MATCH_RANGE 4 /* match failure on [..] construct */ X#define MATCH_ABORT 3 /* premature end of text string */ X#define MATCH_END 2 /* premature end of pattern string */ X#define MATCH_VALID 1 /* valid match */ X X/* pattern defines */ X#define PATTERN_VALID 0 /* valid pattern */ X#define PATTERN_ESC -1 /* literal escape at end of pattern */ X#define PATTERN_RANGE -2 /* malformed range in [..] construct */ X#define PATTERN_CLOSE -3 /* no end bracket in [..] construct */ X#define PATTERN_EMPTY -4 /* [..] contstruct is empty */ X X/*---------------------------------------------------------------------------- X* X* Match the pattern PATTERN against the string TEXT; X* X* match() returns TRUE if pattern matches, FALSE otherwise. X* matche() returns MATCH_VALID if pattern matches, or an errorcode X* as follows otherwise: X* X* MATCH_PATTERN - bad pattern X* MATCH_LITERAL - match failure on literal mismatch X* MATCH_RANGE - match failure on [..] construct X* MATCH_ABORT - premature end of text string X* MATCH_END - premature end of pattern string X* MATCH_VALID - valid match X* X* X* A match means the entire string TEXT is used up in matching. X* X* In the pattern string: X* `*' matches any sequence of characters (zero or more) X* `?' matches any character X* [SET] matches any character in the specified set, X* [!SET] or [^SET] matches any character not in the specified set. X* X* A set is composed of characters or ranges; a range looks like X* character hyphen character (as in 0-9 or A-Z). [0-9a-zA-Z_] is the X* minimal set of characters allowed in the [..] pattern construct. X* Other characters are allowed (ie. 8 bit characters) if your system X* will support them. X* X* To suppress the special syntactic significance of any of `[]*?!^-\', X* and match the character exactly, precede it with a `\'. X* X----------------------------------------------------------------------------*/ X XBOOLEAN match (char *pattern, char *text); X Xint matche(register char *pattern, register char *text); X X/*---------------------------------------------------------------------------- X* X* Return TRUE if PATTERN has any special wildcard characters X* X----------------------------------------------------------------------------*/ X XBOOLEAN is_pattern (char *pattern); X X/*---------------------------------------------------------------------------- X* X* Return TRUE if PATTERN has is a well formed regular expression according X* to the above syntax X* X* error_type is a return code based on the type of pattern error. Zero is X* returned in error_type if the pattern is a valid one. error_type return X* values are as follows: X* X* PATTERN_VALID - pattern is well formed X* PATTERN_ESC - pattern has invalid escape ('\' at end of pattern) X* PATTERN_RANGE - [..] construct has a no end range in a '-' pair (ie [a-]) X* PATTERN_CLOSE - [..] construct has no end bracket (ie [abc-g ) X* PATTERN_EMPTY - [..] construct is empty (ie []) X* X----------------------------------------------------------------------------*/ X XBOOLEAN is_valid_pattern (char *pattern, int *error_type); END_OF_FILE if test 3963 -ne `wc -c <'match.h'`; then echo shar: \"'match.h'\" unpacked with wrong size! fi # end of 'match.h' fi if test -f 'matchmak' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'matchmak'\" else echo shar: Extracting \"'matchmak'\" \(396 characters\) sed "s/^X//" >'matchmak' <<'END_OF_FILE' X# X# X# Makefile for match.c X# X# Created 01-20-91 JBK X# Last Modified 02-13-91 JBK X# X# X XCC = cl X X# X# This is FLAGS for optimized version X#FLAGS = /c /AL /G2 /Ox /W4 X X# X# This is FLAGS for optimized version with main XFLAGS = /D TEST /AL /G2 /Ox /W4 X X# X# This is FLAGS for debugging versions with main X#FLAGS = /D TEST /AL /G2 /Od /W4 /Zi /qc X X Xmatch.exe: match.c match.h X $(CC) $(FLAGS) match.c END_OF_FILE if test 396 -ne `wc -c <'matchmak'`; then echo shar: \"'matchmak'\" unpacked with wrong size! fi # end of 'matchmak' fi if test -f 'matchtst.bat' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'matchtst.bat'\" else echo shar: Extracting \"'matchtst.bat'\" \(3776 characters\) sed "s/^X//" >'matchtst.bat' <<'END_OF_FILE' X@echo off X Xecho . Xecho Beginning MATCHTST Xecho . Xecho Creating file TEST.OUT Xecho . X XREM The following tests should match X Xmatch test? testy > test.out Xmatch test* test >> test.out Xmatch tes*t test >> test.out Xmatch *test test >> test.out Xmatch t*s*t test >> test.out Xmatch t*s*t tesseract >> test.out Xmatch t?s? test >> test.out Xmatch ?s*t psyot >> test.out Xmatch [a-z]s*t asset >> test.out Xmatch s[!gh]t set >> test.out Xmatch t[a-ce]st test >> test.out Xmatch tea[ea-c]up teacup >> test.out Xmatch [a-fh-z]* jack >> test.out Xmatch \i\** i*hello >> test.out Xmatch [\[-\]] [ >> test.out Xmatch [a-z\\] \ >> test.out Xmatch [a-z%_] b >> test.out Xmatch [\]] ] >> test.out Xmatch \i?* itch >> test.out Xmatch \i?* it >> test.out Xmatch ?*?*?t test >> test.out Xmatch ?*?*?*?* test >> test.out Xmatch *\]*\**\?*\[ ]this*is?atest[ >> test.out Xmatch [a-\\]* at >> test.out Xmatch [a-d\\-/] c >> test.out Xmatch *t?l*his bright-land-high-and-his >> test.out X X XREM The following tests should fail X Xmatch test test >> test.out Xmatch \ test >> test.out Xmatch tes\ test >> test.out Xmatch t*s*t texxeract >> test.out Xmatch t?st tst >> test.out Xmatch test? test >> test.out Xmatch s[!e]t set >> test.out Xmatch [] ] >> test.out Xmatch [ [ >> test.out Xmatch [\[-\] [ >> test.out Xmatch [a atest >> test.out Xmatch [a- atest >> test.out Xmatch [a-z atest >> test.out Xmatch [a-]* atest >> test.out Xmatch [a-fh-z jack >> test.out Xmatch [a-fh-z\] jack >> test.out Xmatch [a-fh-z] jack >> test.out Xmatch ?*?*?t*? test >> test.out Xmatch *????? test >> test.out Xmatch *\ test >> test.out Xmatch [a-e\ atest >> test.out Xmatch [a-\ atest >> test.out Xmatch [a-bd-\ atest >> test.out Xmatch *?*?t? test >> test.out Xmatch t? t >> test.out Xmatch ??*t step >> test.out Xmatch [a-e]*[!t eel >> test.out Xmatch [\ test >> test.out Xmatch \t[est test >> test.out Xmatch ?*[] hello >> test.out X Xecho MATCHTST Complete Xecho . END_OF_FILE if test 3776 -ne `wc -c <'matchtst.bat'`; then echo shar: \"'matchtst.bat'\" unpacked with wrong size! fi # end of 'matchtst.bat' fi if test -f 'readme.doc' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'readme.doc'\" else echo shar: Extracting \"'readme.doc'\" \(6018 characters\) sed "s/^X//" >'readme.doc' <<'END_OF_FILE' X03-12-91 X XThis is V1.1 of REGEX Globber. X X X03-12-91 X XI have made a few changes to the match module which do several Xthings. The first change is an increase in bad pattern detection Xduring a match. It was possible, in some very unlikely cases, to Xcook up a pattern which should result in an early bad match but which Xwould actually cause problems for the parser. In particular, the Xsubcase where the literal escape '\' within an open [..] construct at Xthe end of a pattern would end up with incorrect results. I Xproceeded to create some of these patterns, added them to my test Xbattery and dove straight in. X XIn the interim I came across a posting to CompuServe (SMATCH by Stan XAderman) which attempted to create a completely non-recursive Ximplementation of match (I am not sure this is possible without Xexplicitly creating your own stack or it's equivelant, like a binary Xtree :-{ ). While the code could not correctly handle multiple '*' Xcharacters in the pattern, there was a few interesting ideas in the Xposting. On some occasions, running match over and over would be Xcounter productive, especially and in particular when you have a bad Xpattern. I have added a fast routine, is_valid_pattern(), to Xdetermine if the current pattern is well formed which should address Xthis situation. X XOne other idea which I unceremoniously lifted from SMATCH was (in Xhindsight a pretty obvious feature) the return of a meaningful error Xcode from both the pattern validity routine and from match() (which I Xrenamed to matche()). X XI also took some time to experiment with some ways to cut some time Xoff the routine. Since this is a SH pattern matcher whose intent is Xprimarily for shell functions, the changes could not be algorithmic Xchanges which relied on speedup over large input. The differences in Xexecution time were not very significant, but I did manage to gain Xapproximately 5%-10% speedup when I removed the literal escape ('\') Xparsing and pattern error checking. For those of you who want to use Xthis for filename wildcard usage, I would recommend doing this since Xyou should use is_valid_pattern and is_pattern before going out and Xfinding filenames and the dos path delimiter defaults to the Xcharacter used for the literal escape ('\') anyway (Note: I will be Xsoon be releasing a *IX style file parser in the FINDFILE, FINDNEXT Xflavor soon to a Public Domain archive near you :-) ). X XI also briefly toyed with adding a non-SH regex character '+' to this Xmodule but removed it again. It was a performance hit of a few Xpercent and would be mostly unused in any event. For those Xinterested in such a feature, the changes are truly minimal. The Xrequired extra work is: X X 1) One case statement each in is_pattern() and is_valid_pattern() X 2) One case statement in matche() X 3) One addition to a while conditional in matche_after_star() X 4) One addition to an if conditional in matche_after_star() X XHint: The case statements are all "case '+'" and the conditionals X have "|| *p == '+' " added to them. X XI have also included a file (MATCH.DOC) which describes matches use and Xbackground as well as a little about regular expressions. X X jbk X X02-24-91 X XThis is V1.01 of REGEX Globber. X X X02-22-91 Seattle, WA X XHmm. Choke. (Foot in mouth). After griping about buggy routines and Xliterally seconds after posting this code the first time, I received Xa wonderful new test evaluation tool which allows you to perform Xcoverage analysis during testing. Sure enough I found that about X25% of the paths in the program were never traversed in my current Xtest battery. After swallowing my (overly large) pride and coming Xup with a test battery which covered the entire path of the program XI found a couple of minor logic bugs involving literal escapes (\) Xwithin other patterns (ie [..] and * sequences). I have repackaged Xthese routines and included also the makefile I use and the test Xbattery I use to make things a bit easier. X X jbk X X02-20-91 Seattle, WA X XHere is a *IX wildcard globber I butchered, hacked and cajoled together Xafter seeing and hearing about and becoming disgusted with several similar Xroutines which had one or more of the following attributes: slow, buggy, Xrequired large levels of recursion on matches, required grotesque levels Xof recursion on failing matches using '*', full of caveats about usability Xor copyrights. X XI submit this without copyright and with the clear understanding that Xthis code may be used by anyone, for any reason, with any modifications Xand without any guarantees, warrantee or statements of usability of any Xsort. X XHaving gotten those cow chips out of the way, these routines are fairly Xwell tested and reasonably fast. I have made an effort to fail on all Xbad patterns and to quickly determine failing '*' patterns. This parser Xwill also do quite a bit of the '*' matching via quick linear loops versus Xthe standard blind recursive descent. X XThis parser has been submitted to profilers at various stages of development Xand has come through quite well. If the last millisecond is important to Xyou then some time can be shaved by using stack allocated variables in Xplace of many of the pointer follows (which may be done fairly often) found Xin regex_match and regex_match_after_star (ie *p, *t). X XNo attempt is made to provide general [pat,pat] comparisons. The specific Xsubcases supplied by these routines is [pat,text] which is sufficient Xfor the large majority of cases (should you care). X XSince regex_match may return one of three different values depending upon Xthe pattern and text I have made a simple shell for convenience (match()). XAlso included is an is_pattern routine to quickly check a potential pattern Xfor regex special characters. I even placed this all in a header file for Xyou lazy folks! X XHaving said all that, here is my own reinvention of the wheel. Please Xenjoy it's use and I hope it is of some help to those with need .... X X X jbk X END_OF_FILE if test 6018 -ne `wc -c <'readme.doc'`; then echo shar: \"'readme.doc'\" unpacked with wrong size! fi # end of 'readme.doc' fi echo shar: End of archive 1 \(of 1\). cp /dev/null ark1isdone MISSING="" for I in 1 ; do if test ! -f ark${I}isdone ; then MISSING="${MISSING} ${I}" fi done if test "${MISSING}" = "" ; then echo You have the archive. rm -f ark[1-9]isdone else echo You still must unpack the following archives: echo " " ${MISSING} fi exit 0 exit 0 # Just in case... -- Kent Landfield INTERNET: kent@sparky.IMD.Sterling.COM Sterling Software, IMD UUCP: uunet!sparky!kent Phone: (402) 291-8300 FAX: (402) 291-4362 Please send comp.sources.misc-related mail to kent@uunet.uu.net.