mhb@hcx.uucp (MARK H BOTNER) (05/18/89)
Can anybody help me with pattern matching? I'm trying to match a string against a pattern such as '*' or '*.*' in the same manner that the 'ls' command uses. I've tried regex and regexp and I've even downloaded regex.Z from uunet.uu.net, but none of these work like I want. Actually I probably only need the wildcard characters '*' and '?', but it wouldn't hurt to have the full expression syntax that the 'ls' command uses. Thanks! P.S. if you have any answer or a subroutine for me, please mail it to me. My address is: mhb@cseg.uucp
gwyn@smoke.BRL.MIL (Doug Gwyn) (05/18/89)
In article <2414@cveg.uucp> mhb@hcx.uucp (MARK H BOTNER) writes:
-Can anybody help me with pattern matching? I'm trying to match
-a string against a pattern such as '*' or '*.*' in the same manner that
-the 'ls' command uses.
That should be extremely easy, as "ls" does no pattern matching.
-I've tried regex and regexp and I've even downloaded regex.Z from
-uunet.uu.net, but none of these work like I want.
Gee, they ought to.
abcscnge@csuna.csun.edu (Scott "The Pseudo-Hacker" Neugroschl) (05/18/89)
In article <2414@cveg.uucp> mhb@hcx.uucp (MARK H BOTNER) writes: } }Can anybody help me with pattern matching? I'm trying to match }a string against a pattern such as '*' or '*.*' in the same manner that }the 'ls' command uses. I've tried regex and regexp and I've even downloaded }regex.Z from uunet.uu.net, but none of these work like I want. Actually }I probably only need the wildcard characters '*' and '?', but it wouldn't }hurt to have the full expression syntax that the 'ls' command uses. Thanks! } Your problem is that reg. exprs and the shell wildcarding are different. Regular expressions: . -- match any single character * -- 0 or more of the preceding regex. Shell Wildcarding ? -- match any single character * -- match 0 or more characters. The wildcarding that "ls uses" is not done by ls. It is performed by the shell prior to executing the ls command. If I were in your shoes, I would take the shell wildcard, and translate it to regex form: ? --> . * --> .* and then use the regex stuff. -- Scott "The Pseudo-Hacker" Neugroschl UUCP: ...!sm.unisys.com!csun!csuna.csun.edu!abcscnge -- Beat me, Whip me, make me code in Ada -- Disclaimers? We don't need no stinking disclaimers!!!
pokey@well.UUCP (Jef Poskanzer) (05/19/89)
In the referenced message, mhb@hcx.uucp (MARK H BOTNER) wrote: }Can anybody help me with pattern matching? }P.S. if you have any answer or a subroutine for me, please mail it to me. }My address is: } mhb@cseg.uucp That address is useless, so I'm posting this. This is free software, no copyright or trade secret restrictions whatsoever. --- Jef Jef Poskanzer jef@helios.ee.lbl.gov ...well!pokey /* ** Simple pattern matcher. Does ?, *, and [], much like various shells. ** ** by Jef Poskanzer */ amatch( s, p ) register char *s, *p; { for ( ; *p != '\0'; p++, s++ ) { if ( *p == '*' ) { if ( *++p == '\0' ) return 1; for ( ; *s != '\0'; s++ ) if ( amatch( s, p ) ) return 1; return 0; } if ( *s == '\0' ) return 0; if ( *p == '?' ) continue; if ( *p == '[' ) { int negcc = 0, ccmatch = 0; char prevc; if ( *++p == '!' ) { negcc = 1; p++; } for ( ; *p != ']'; p++ ) { if ( *p == '\0' ) { fprintf( stderr, "amatch: missing ']'\n" ); return 0; } if ( *p == '-' ) { if ( prevc <= *s && *++p >= *s ) ccmatch = 1; } else if ( *p == *s ) ccmatch = 1; prevc = *p; } if ( ( ccmatch && ! negcc ) || ( negcc && ! ccmatch ) ) continue; return 0; } if ( *p != *s ) return 0; } return *s == '\0'; }
doug@xdos.UUCP (Doug Merritt) (05/19/89)
In article <10288@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes: >In article <2414@cveg.uucp> mhb@hcx.uucp (MARK H BOTNER) writes: >>Can anybody help me with pattern matching? I'm trying to match >>a string against a pattern such as '*' or '*.*' in the same manner that >>the 'ls' command uses. >That should be extremely easy, as "ls" does no pattern matching. Nitpick. Where's your ":-)" ? Substitute "sh" for "ls". Besides, maybe he's using MS-DOS, in which case it might really be "ls" that's doing the matching he's used to. >>I've tried regex and regexp and I've even downloaded regex.Z from >>uunet.uu.net, but none of these work like I want. >Gee, they ought to. Not really...as I recall, they implement ed/ex/vi/sed-syntax patterns such as "." and ".*", where he wants "?" and "*". Though you'd think it'd be easy to modify that regexp.Z source to use the desired syntax. I have a C routine that does pattern matching, but it, too, is not strictly compatible with "sh" so that probably wouldn't help. Doug -- Doug Merritt {pyramid,apple}!xdos!doug Member, Crusaders for a Better Tomorrow Professional Wildeyed Visionary
koblas@mips.COM (David Koblas) (05/19/89)
In article <11733@well.UUCP> Jef Poskanzer <jef@helios.ee.lbl.gov> writes: >In the referenced message, mhb@hcx.uucp (MARK H BOTNER) wrote: >}Can anybody help me with pattern matching? >}P.S. if you have any answer or a subroutine for me, please mail it to me. >}My address is: >} mhb@cseg.uucp > >That address is useless, so I'm posting this. This is free software, no >copyright or trade secret restrictions whatsoever. Similar to Jef Posaknzer's but understands '{' ... '}' syntax. #! /bin/sh # This is a shell archive. Remove anything before this line, then feed it # into a shell via "sh file" or similar. To overwrite existing files, # type "sh file -c". # The tool that generated this appeared in the comp.sources.unix newsgroup; # send mail to comp-sources-unix@uunet.uu.net if you want that tool. # If this archive is complete, you will see the following message at the end: # "End of shell archive." # Contents: glob.c # Wrapped by koblas@yoyodyne.mips.com on Fri May 19 09:04:13 1989 PATH=/bin:/usr/bin:/usr/ucb ; export PATH if test -f 'glob.c' -a "${1}" != "-c" ; then echo shar: Will not clobber existing file \"'glob.c'\" else echo shar: Extracting \"'glob.c'\" \(3831 characters\) sed "s/^X//" >'glob.c' <<'END_OF_FILE' X/* X * input: "str" string will attempted to be matched X * X * "pattern" string with wildcards that will match against "str". X * X * wild: X * '*' = match 0 or more occurances of anything X * "[abc]" = match anyof "abc" (ranges supported) X * "{xx,yy,zz}" = match anyof "xx", "yy", or "zz" X * '?' = match any character X */ X X#define FALSE 0 X#define TRUE 1 X X#ifdef TEST X X#define TESTGLOB(str1,str2) \ X printf("%s %s = %s\n",str1,str2,glob(str1,str2)?"TRUE":"FALSE") X Xmain() X{ X TESTGLOB("abcdefg","abcdefg"); X TESTGLOB("abcdefg","a?cd?fg"); X TESTGLOB("abcdefg","ab[cde]defg"); X TESTGLOB("abcdefg","ab[a-z]defg"); X TESTGLOB("abcdefg","ab[a-z]defg"); X TESTGLOB("ab]defg","ab[a]c]defg"); X TESTGLOB("ab]defg","ab[a\\]c]defg"); X TESTGLOB("abcdefg","ab*fg"); X TESTGLOB("./bc/def/gh/ij","*de*"); X TESTGLOB("./der/den/deq/der/","*deq*"); X TESTGLOB("./bc/def/gh/ij","*ij"); X TESTGLOB("./bc/def/gh/ij","./*"); X TESTGLOB("abcdef","*def"); X TESTGLOB("abcdef","*abcdef"); X TESTGLOB("abcdef","abc*"); X TESTGLOB("abcdef","abcdef*"); X TESTGLOB("abcdef","*?*{xxx,,yy}"); X TESTGLOB("abcdef","abcde{f}"); X TESTGLOB("abcdef","abcdef{xxx,,yyy}"); X TESTGLOB("abcdef","abc{def,qwrx}"); X TESTGLOB("abcdef","abc{ab,def,qwrx}"); X TESTGLOB("abcdef","{naqrwer,fuwnwer,as,abc,a}{ab,def,qwrx}"); X} X X#endif X Xglob(str,pattern) Xchar *str,*pattern; X{ X char c,*cp; X int done=FALSE,ret_code,ok; X X while ((*pattern != '\0') && (!done) && (((*str=='\0') && X ((*pattern=='{') || (*pattern=='*'))) || (*str!='\0'))) { X switch (*pattern) { X case '\\': X pattern++; X if (*pattern != '\0') X pattern++; X break; X case '*': X pattern++; X ret_code=FALSE; X while ((*str != '\0') && (!(ret_code=glob(str++,pattern)))); X if (ret_code) { X while (*str != '\0') str++; X while (*pattern != '\0') pattern++; X } X break; X case '[': X pattern++; Xrepeat: X if ((*pattern == '\0') || (*pattern == ']')) { X done=TRUE; X break; X } X if (*pattern == '\\') { X pattern++; X if (*pattern == '\0') { X done=TRUE; X break; X } X } X if (*(pattern+1) == '-') { X c = *pattern; X pattern+=2; X if (*pattern == ']') { X done=TRUE; X break; X } X if (*pattern == '\\') { X pattern++; X if (*pattern == '\0') { X done=TRUE; X break; X } X } X if ((*str < c) || (*str > *pattern)) { X pattern++; X goto repeat; X } X } else if (*pattern != *str) { X pattern++; X goto repeat; X } X pattern++; X while ((*pattern != ']') && (*pattern != '\0')) { X if ((*pattern == '\\') && (*(pattern+1) != '\0')) X pattern++; X pattern++; X } X if (*pattern != '\0') { X pattern++; X str++; X } X break; X case '?': X pattern++; X str++; X break; X case '{': /*}*/ X pattern++; X/*{*/ while ((*pattern != '}') && (*pattern!='\0')) { X cp = str; X ok = TRUE; X while (ok && (*cp != '\0') && (*pattern!='\0') && X/*{*/ (*pattern!=',') && (*pattern!='}')) { X if (*pattern == '\\') X pattern++; X ok=(*pattern == *cp); X cp++; X pattern++; X } X if (*pattern=='\0') { X ok=FALSE; X done=TRUE; X break; X } else if (ok) { X str=cp; X/*{*/ while ((*pattern!='}') && (*pattern!='\0')) { X pattern++; X if (*pattern=='\\') { X pattern++; X/*{*/ if (*pattern=='}') X pattern++; X } X } X } else { X/*{*/ while ((*pattern!='}') && (*pattern!=',') && X (*pattern!='\0')) { X pattern++; X if (*pattern=='\\') { X pattern++; X/*{*/ if ((*pattern=='}') || (*pattern==',')) X pattern++; X } X } X } X if (*pattern!='\0') X pattern++; X } X break; X default: X if (*str == *pattern) { X str++; X pattern++; X } else { X done=TRUE; X } X } X } X while (*pattern == '*') pattern++; X return ((*str == '\0') && (*pattern == '\0')); X} END_OF_FILE if test 3831 -ne `wc -c <'glob.c'`; then echo shar: \"'glob.c'\" unpacked with wrong size! fi # end of 'glob.c' fi echo shar: End of shell archive. exit 0 -- name : David Koblas uucp : {ames,decwrl}!mips!koblas place: MIPS Computers Systems domain: koblas@mips.com quote: "Time has little to do with infinity and jelly donuts."
faulkner@jmullins.harvard.edu (Don Faulkner) (05/20/89)
Any one thinking about regular expression handlers should really take a look at the one from GNU EMACS --- it is VERY powerful, fast, and very flexible (you can set it up with a translation table to ignore case ...) Works very well stand alone --- need only two files: "regex.c" and "regex.h" Best part: syntax and capabilities are identical to those from GNU EMACS (golly!) -- Don Faulkner Building 1, Room 803 Harvard University, School of Public Health 665 Huntington Avenue Boston, MA 02115 ARPA: faulkner%jmullins@harvard.harvard.edu BITNET: faulkner@harvard Telephone: (617) 732-2297
ksb@j.cc.purdue.edu (Kevin Braunsdorf) (05/20/89)
In <mhb@hcx.uucp> (MARK H BOTNER) wrote: >Can anybody help me with pattern matching? In <11733@well.UUCP> jef@helios.ee.lbl.gov (Jef Poskanzer) wrote: <some C source> In <20021@yoyodyne.mips.com.mips.COM> koblas@mips.COM (David Koblas) wrote: >Similar to Jef Posaknzer's but understands '{' ... '}' syntax. <more C sourse> Sadly neither of these two routines really emulate the shell filename globbing. Here is a list of (some) failed tests. # pat file ans comment ???? /foo no ? doesn't match a / ?* /foo no ? doesn't match a / /??????? /usr/foo no ? doesn't match a / ? / no ? doesn't match a / .? ./ no ? doesn't match a non-leading / ? . no ? doesn't match a leading . * / no * doesn't match a / *? /foo no * doesn't match a / /u/* /u/a/foo no * doesn't match a / /u* /u/foo no * doesn't match a / * . no * doesn't match a . [/,]foo /foo no ranges do not match / * .ksb no * doesn't match leading . /*b /.ksb no * doesn't match leading . ?ksb .ksb no ? doesn't match leading . /?ksb /.ksb no ? doesn't match leading . # and one questionable rule.... //ksb /ksb yes any # glob /'s matches a / /a//b /a/b yes any # glob /'s matches a / For exmple: glob("/bin/mail", "*") would return true under both of these; (IMHO) this is wrong. [ We need at least: glob("/bin/mail", "/*/*") to really match this file. ] BTW: (noting the ! lines below,) does anyone know why the shell does this? $ touch ga g. .a $ ls -a ./ ../ .a g. ga $ echo g? g. ga $ echo "g"? g. ga $ echo .? .. .a ! $ echo "."? ! .? $ exit The `?' is not quoted in the last case... ? I will post my glob routines (which I think work) soon. -- "Time, time, time, see what's become of me..." kayessbee, Kevin Braunsdorf, ksb@j.cc.purdue.edu, pur-ee!ksb, purdue!ksb
lance@hermix.UUCP (Lance Ellinghouse) (05/20/89)
In article <10288@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes: > In article <2414@cveg.uucp> mhb@hcx.uucp (MARK H BOTNER) writes: > -Can anybody help me with pattern matching? I'm trying to match > -a string against a pattern such as '*' or '*.*' in the same manner that > -the 'ls' command uses. > > That should be extremely easy, as "ls" does no pattern matching. > This is true. The shell does the expansion of wild card characters. the program just gets a list. -- Lance Ellinghouse Mark V Systems, Ltd. UUCP: ...!hermix!lance ARPA: ucla-an!hermix!lance@ee.UCLA.EDU
dws@cseg.uucp (David W. Summers) (05/20/89)
In article <11733@well.UUCP>, pokey@well.UUCP (Jef Poskanzer) writes: > In the referenced message, mhb@hcx.uucp (MARK H BOTNER) wrote: > }My address is: > } mhb@cseg.uucp > > That address is useless, so I'm posting this. This is free software, no > copyright or trade secret restrictions whatsoever. > --- > Jef > > Jef Poskanzer jef@helios.ee.lbl.gov ...well!pokey Thank you for replying to Mark Botner. He and I are working on some routines to allow a BBS to do file matching on user entered patterns. My question is: Why is this addresses useless? I noticed that you had the the same return address in your 'From: ' header (specifically, "pokey@well.uucp"). This is the same *type* of address as ours. Even the "well!pokey" is the same type of address, although your other address in your signature is obviously a domain name type which is different. I don't think our address is really useless as I can mail to ANYONE with that type of address (as long as the routing maps are correct). - David Summers (dws@cseg.uucp) - The "curious about E-Mail" and "wanting to get it right" guy.
khera@juliet.cs.duke.edu (Vick Khera) (05/20/89)
In article <2428@cveg.uucp> dws@cseg.uucp (David W. Summers) writes: > >In article <11733@well.UUCP>, pokey@well.UUCP (Jef Poskanzer) writes: >> In the referenced message, mhb@hcx.uucp (MARK H BOTNER) wrote: >> }My address is: >> } mhb@cseg.uucp >> >> That address is useless, so I'm posting this. This is free software, no >> ... >> Jef > ... >My question is: Why is this addresses useless? I noticed that you had the >... I don't think >our address is really useless as I can mail to ANYONE with that type of >address (as long as the routing maps are correct). > >- David Summers well, if you try sending a reply to somebody with mail address of linus!user, then where will it go? will it go to some machine in princeton or at the university of maryland? with a uucp address of just the machine name, there will likely be duplicates of names such as linus, juliet, and other literary names. for instance, i am on juliet.cs.duke.edu, and to get uucp traffic here, it needs to be qualified with something like mcnc!duke!juliet!khera since mcnc is a well known system. mail to just juliet!khera could end up on a different machine depending on where it is mailed from. vick. =-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-= ARPA: khera@cs.duke.edu Department of Computer Science CSNET: khera@duke Duke University UUCP: {mcnc,decvax}!duke!khera Durham, NC 27706
880716a@aucs.uucp (Dave Astels) (05/29/90)
I am about to start developing pattern matching software using C++. I have done something similar to what I plan a while ago using LISP. I would like an algorithm that would be more efficient than the heavily recursive LISP implementation. It is to be used in a real-time situation, to analyse user input. Input is seen as a list of strings, each being one word. The patterns are lists of either strings (requiring a literal match) or sub-lists (commands). Commands include: match any word (0 or one occurance) match any words (0 or more) and (1 or more) match the 3 cases above, but to match, input must be in a specified list. etc. I would like any information, including pointers to references. Example code would be appreciated as well. -- - Dave Internet: 880716a@AcadiaU.CA Bitnet: 880716a@Acadia