Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) (12/29/90)
To: Alex Martelli AM> Optimizing this, and in particular eliminating the recursive call, AM> is, on the other hand, a rather interesting exercise, but one I'd AM> undertake only if wildcard-matching was proven-by-profiling to be AM> a bottleneck in my application... which is rather unlikely, I think. I think that the recursive call is unnessicary. Consider that anything trailing the '*' character is ignored anyway, at least in MS-DOS, and OS/2. What need is there for further checking if a '*' character is encountered? Does that hold true in the UNIX would too? The following function will return the same results, with the exceptional case being when the '*' character is not the last character in the wildcard string. Plus it is a little speedier, due to the fact that no data is copied from the original string. int wildmatch( char *pat, char *str ) { while ( (*pat != '*') && (*pat != 0) && (*str != 0) ) { if ( (*pat != '?') && (*str != *pat) ) { return 0; } pat++; str++; if ( (!*pat && *str) || (!*str && *pat) ) { return 0; } } return( 1 ); }
mat@mole-end.UUCP (Mark A Terribile) (01/01/91)
(On the subject of wildcard patterns whose match is pathalogical in time ...) > I think that the recursive call is unnessicary. Consider that anything > trailing the '*' character is ignored anyway, at least in MS-DOS, and > OS/2. What need is there for further checking if a '*' character is > encountered? Does that hold true in the UNIX would too? In UNIX, the `*' is NOT limited to matching a whole name component, but instead any sequence of characters, anywhere. Thus, one can write a*b*c If you want to guarantee efficient matching, then I think you have to generate a DFSA. If you want to come close, you can put some pathology checks in your code (like `*' after `*'). And there are publicly available regex-like matchers, most of which will want .* for a closure on an arbitrary character. Some of them might not be too hard to change. -- (This man's opinions are his own.) From mole-end Mark Terribile
staff@cadlab.sublink.ORG (Alex Martelli) (01/02/91)
Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) writes: >To: Alex Martelli > >AM> Optimizing this, and in particular eliminating the recursive call, >AM> is, on the other hand, a rather interesting exercise, but one I'd >AM> undertake only if wildcard-matching was proven-by-profiling to be >AM> a bottleneck in my application... which is rather unlikely, I think. > >I think that the recursive call is unnessicary. Consider that anything >trailing the '*' character is ignored anyway, at least in MS-DOS, and >OS/2. What need is there for further checking if a '*' character is >encountered? Does that hold true in the UNIX would too? I know that MSDOS and OS/2 consider that, when typing "a*b" or "a*c", the user was just being funny, and he _actually_ meant just "a*" in either case... presumably the extra letter, or letters, having been typed just to keep his or her fingers in exercise?-) I'm also happy to relate, though, that _other_ operating systems DO have a modicum of common sense in their wildcars semantics! That is, one can use, for example, "a*b" to stand for all files with names beginning with a and ending with b, with anything or nothing in between. This is true at least in Dec's VMS and IBM's VM/SP (CMS), as well, of course, as in Unix. If you write your own subroutines for matching with wildcards, you can of course offer whatever semantics strike your fancy. I can hardly believe anybody would care to defend the absurd semantics of MessyDOS in this respect, though! -- Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).
scs@adam.mit.edu (Steve Summit) (01/02/91)
In article <4739.277BA2FB@urchin.fidonet.org> Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) writes: >I think that the recursive call is unnessicary. Consider that anything >trailing the '*' character is ignored anyway, at least in MS-DOS, and >OS/2. What need is there for further checking if a '*' character is >encountered? Does that hold true in the UNIX would too? Oh, dear. While I don't believe that computer "science" is a hard enough science that those without formal training in it should be deprecated, it is nevertheless the case that those whose only exposure to it is through MS-DOS or other Microsoft "operating systems" should realize that those "systems" are so mind-bogglingly behind the state-of-the-art that extrapolation based on them is bound to be embarrassingly futile. For a long time I had assumed that DOS's notion that "anything trailing the '*' character is ignored anyway" was an outright bug in its implementation, because most DOS manuals imply that a * matches anything anywhere, not just at the end. I'll keep a firm rein on myself and not let this turn into an all-out DOS flame. Its wildcard matching "algorithm" is only one of a bewildering number of appalling misfeatures. I have never even tried to figure out exactly how DOS fails to implement wildcards correctly. (Obviously whoever implemented them didn't really know what they were good for, and was limiting himself to assembly language programming on a spineless processor, where a correct implementation might have been somewhat more difficult.) In classic regular expressions, and in every computer system I have ever used (needless to say this includes Unix), a pattern can certainly continue past a '*'; the utility of doing so would only be mysterious to someone with the misfortune to have first encountered wildcards under a "system" with as trivialized and impotent an implementation as our beloved DOS. >The following function will return the same results, with the exceptional >case being when the '*' character is not the last character in the >wildcard string. Please, don't _ever_ implement a wildcard matcher with this "exception." It is not an "exceptional case" for the '*' not to be the last character in the pattern; it is an absolute requirement, and to leave it out should never have been contemplated. Steve Summit scs@adam.mit.edu
slh@wolf.cs.washington.edu (Scott Heyano) (01/03/91)
In article <579@cadlab.sublink.ORG> staff@cadlab.sublink.ORG (Alex Martelli) writes: |Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) writes: | [...] |>trailing the '*' character is ignored anyway, at least in MS-DOS, and |>OS/2. What need is there for further checking if a '*' character is |>encountered? Does that hold true in the UNIX would too? | |I know that MSDOS and OS/2 consider that, when typing "a*b" or "a*c", |the user was just being funny, and he _actually_ meant just "a*" in |either case... presumably the extra letter, or letters, having been |typed just to keep his or her fingers in exercise?-) | This is not true of OS/2.
staff@cadlab.sublink.ORG (Alex Martelli) (01/03/91)
slh@wolf.cs.washington.edu (Scott Heyano) writes: :In article <579@cadlab.sublink.ORG> staff@cadlab.sublink.ORG (Alex Martelli) writes: :|Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) writes: :| :[...] :|>trailing the '*' character is ignored anyway, at least in MS-DOS, and :|>OS/2. What need is there for further checking if a '*' character is :|>encountered? Does that hold true in the UNIX would too? :| :|I know that MSDOS and OS/2 consider that, when typing "a*b" or "a*c", :|the user was just being funny, and he _actually_ meant just "a*" in :|either case... presumably the extra letter, or letters, having been :|typed just to keep his or her fingers in exercise?-) :| : This is not true of OS/2. Oops, sorry, Scott is right, Tony was wrong, and I was silly for accepting Tony's "and OS/2" bit at face value instead of checking! I *did* check, now, and at least CMD.EXE gets it right - DIR T*P, in particular, only lists files whose names begin with T ***AND*** END WITH P. One more reson to do wildmatch() right, then! -- Alex Martelli - CAD.LAB s.p.a., v. Stalingrado 53, Bologna, Italia Email: (work:) staff@cadlab.sublink.org, (home:) alex@am.sublink.org Phone: (work:) ++39 (51) 371099, (home:) ++39 (51) 250434; Fax: ++39 (51) 366964 (work only), Fidonet: 332/401.3 (home only).
Tony.Bielobockie@urchin.fidonet.org (Tony Bielobockie) (01/04/91)
Steve Summit writes: >Oh, dear. While I don't believe that computer "science" is a >hard enough science that those without formal training in it >should be deprecated, it is nevertheless the case that those >whose only exposure to it is through MS-DOS or other Microsoft >"operating systems" should realize that those "systems" are so >mind-bogglingly behind the state-of-the-art that extrapolation >based on them is bound to be embarrassingly futile. For a long >time I had assumed that DOS's notion that "anything trailing the >'*' character is ignored anyway" was an outright bug in its >implementation, because most DOS manuals imply that a * matches >anything anywhere, not just at the end. The only persons that I discount are those that have reached the level of pomposity that you have. MS-DOS does not implement wildcard matching properly (noticed in retrospect), OS/2 does. I can see your distaste for MS-DOS, which is more of a file loader than an operating system. What is your beef with OS/2? Are you perhaps just an anti-MicroSoft bigot? Finally, why are you so rude? This is to be a discussion group where an exchange of ideas and knowledge takes place. It is not your personal message arena where you are free to insult, and attempt to intimidate your fellow man. Intimidation may have worked for you in the past, it however, will not work on me, it only goes further in proving what a egotistical jackass that you are.
domo@tsa.co.uk (Dominic Dunlop) (01/04/91)
In article <579@cadlab.sublink.ORG> staff@cadlab.sublink.ORG (Alex Martelli) writes: > I know that MSDOS and OS/2 consider that, when typing "a*b" or "a*c", > the user was just being funny, and he _actually_ meant just "a*" in > either case... presumably the extra letter, or letters, having been > typed just to keep his or her fingers in exercise?-) > > I'm also happy to relate, though, that _other_ operating systems DO > have a modicum of common sense in their wildcars semantics! > That is, one can use, for example, "a*b" to stand for all files with > names beginning with a and ending with b, with anything or nothing in > between. This is true at least in Dec's VMS and IBM's VM/SP (CMS), as > well, of course, as in Unix. For what it's worth (all contributions gratefully received), the forthcoming POSIX 1003.2 shell and tools standard codifies UNIX-style filename ``globbing'', including the System V-ish ``*[!c]'' to match, in this example, any filename not ending in ``c''. The standard will specify two functions, glob() and globfree() which can be called from the C language, ending today's unsatisfactory situation, where one must either recode globbing or spawn a shell to do it. How much all this helps users of other operating systems is open to question... -- Dominic Dunlop
karl@ima.isc.com (Karl Heuer) (01/05/91)
Ins <584@cadlab.sublink.ORG> staff@cadlab.sublink.ORG (Alex Martelli) writes: >One more reson to do wildmatch() right, then! But first, check whether it needs to be done at all; and if so, implement it using the standard interface. Many libraries, and at least one influential standard, already contain "int gmatch(char *str, char *pat)". It handles not only "*" and "?" but also "[xyz]" and "[!xyz]" (yes, that's a bang, not a circumflex; glob vs regexp strikes again). Karl W. Z. Heuer (karl@ima.isc.com or uunet!ima!karl), The Walking Lint
mcdonald@aries.scs.uiuc.edu (Doug McDonald) (01/05/91)
>In article <579@cadlab.sublink.ORG> staff@cadlab.sublink.ORG (Alex Martelli) >writes: >> I know that MSDOS and OS/2 consider that, when typing "a*b" or "a*c", >> the user was just being funny, and he _actually_ meant just "a*" in >> either case... presumably the extra letter, or letters, having been >> typed just to keep his or her fingers in exercise?-) >> >> I'm also happy to relate, though, that _other_ operating systems DO >> have a modicum of common sense in their wildcars semantics! >> That is, one can use, for example, "a*b" to stand for all files with >> names beginning with a and ending with b, with anything or nothing in >> between. This is true at least in Dec's VMS and IBM's VM/SP (CMS), as >> well, of course, as in Unix. > Apparently the above author does not understand MS-DOS wildcards. MS-DOS DOES have more than a "modicum" of common sense in its wild card system. It is very simple: A filename has the form abcdefgh.ijk , i.e. one to 8 characters followed by an optional dot and one to three characters. Wildcards operate separately in the first and second parts. A "*" is a wildcard that means "any character in that and any succeeding position". A "?" is a wildcard that means "any character in just that one position". That is what the manual says, and that is what MS-DOS itself does. For example, a?b means a file with three letters, no extension, and a and b in the first and third positions. a?b*.* means a file with a and b in the first and third places, anything in the second place, and anything at all in the fourth through 8th positions, and any extension, or no extension at all. This of course has nothing to do with the C language, except that some people apparently can't write C programs for MS-DOS that do wildcards as they are defined for it. Doug McDonald
scs@adam.mit.edu (Steve Summit) (01/05/91)
Flames, of course, beget flames, and I expected someone might take issue with my recent bout of gratuitous DOS-bashing. However, someone apparently thought I was flaming _him_, and ragged on me rather publicly, so I suppose I sorta have to respond publicly, too. Sorry about this. From article <5012.2783BE37@urchin.fidonet.org>: > ...why are you so rude? This is to be a discussion group > where an exchange of ideas and knowledge takes place. It is not > your personal message arena where you are free to insult, and attempt > to intimidate your fellow man. Yes, it is a discussion group, and a fairly sophisticated and literate one at that, so sometimes one has to read carefully to avoid misunderstanding. What I wrote was > While I don't believe that computer "science" is a > hard enough science that those without formal training in it > should be deprecated, it is nevertheless the case that those > whose only exposure to it is through MS-DOS or other Microsoft > "operating systems" should realize that those "systems" are so > mind-bogglingly behind the state-of-the-art that extrapolation > based on them is bound to be embarrassingly futile. That sentence is a bit on the long side, but it is not meant to disparage anyone (outside of Microsoft). I realized when I wrote it that it might be read as a put-down of "those without formal training," and I see now that I should have worried a bit more. I _don't_ look down on people who use computers without having had formal training. In fact, the quotes around "science" are a clue to my real attitude: since formal training in computer "science" is no prerequisite for success, it is the "science" which I look down on. All I was saying was that if someone's entire exposure to computers is through a PC, there is a lot that person is missing out on. This is not to demean that person; I would be naive if I did not realize that there are orders of magnitude more people whose only exposure to computers is through PC's than there are people who are familiar with the "real" computers I happen to prefer. ("Embarrassingly futile" was admittedly unnecessarily harsh. However: if you try to extrapolate, based on MS-DOS, how a "real" operating system might do something, you can easily embarrass yourself, such as by suggesting "simplified" wildcard matching algorithms patterned after DOS's simpleminded semantics. I'm not disparaging someone who makes such a mistake; the words "embarrassingly futile" were, believe it or not, intended to be basically sympathetic.) > MS-DOS does not implement wildcard > matching properly (noticed in retrospect), OS/2 does. > I can see your distaste for MS-DOS, which is more of a file loader > than an operating system. What is your beef with OS/2? I have never used OS/2 and know little about it. I implied that it shared DOS's crippled wildcard algorithm only because the complainant originally said it did (in <4739.277BA2FB@urchin.fidonet.org>: "Consider that anything trailing the '*' character is ignored anyway, at least in MS-DOS, and OS/2"). > Are you perhaps just an anti-MicroSoft bigot? That much I'll admit to. I'd expound on this, but it'd be more flamage having nothing to do with C. To everyone else: I have been appalled at how easily silly, pointless flames seem to be cropping up lately, in all manner of newsgroups, and I apologize for having been part of one here. Steve Summit scs@adam.mit.edu