[comp.lang.c] string comparisons in C

covertr@gtephx.UUCP (Richard E. Covert) (07/14/89)

	This is not a flame.

	Mark Williams C is a wonder. The manual is so full of GEM/AES/VDI
hints that it still amazes me after two years! Anyway, C has a lack of good
string handling operations, so you need to use library functions. I was writing
a program in which I needed to determine whether a certain file type was a 
member of a desired set. So, how do you do string comparisons in C?? The
normal, portable, way is to do a char search and match. Slow and ugly.

	But, I was browsing thru the MWC manual, and lo and behold I see
pnmatch(). Now pnmatch is a wonderful little function which does string
comparisons. And it even accepts wildcards, so I was in business. I just
build an array of strings such as "*.PI1", and then by looping thru the
array I can string compare a user inputted filename against the list of
legal filetypes. Pretty neat solution.

	So, the moral of this little ditty, is READ YOUR MANUAL!!

P.S. Does anyone know if pnmatch() is implemented on other C compilers??

Richard (gtephx!covertr) Covert

leo@philmds.UUCP (Leo de Wit) (07/14/89)

In article <44672745.14a1f@gtephx.UUCP> covertr@gtephx.UUCP (Richard E. Covert) writes:
|	But, I was browsing thru the MWC manual, and lo and behold I see
|pnmatch(). Now pnmatch is a wonderful little function which does string
|comparisons. And it even accepts wildcards, so I was in business. I just
|build an array of strings such as "*.PI1", and then by looping thru the
|array I can string compare a user inputted filename against the list of
|legal filetypes. Pretty neat solution.
|
|	So, the moral of this little ditty, is READ YOUR MANUAL!!
|
|P.S. Does anyone know if pnmatch() is implemented on other C compilers??

Lattice C has stcpm() and stcpma() for unanchored and anchored pattern
matching. The BSD C libraries have regcomp() and regex() for regular
expression pattern matching (which probably goes a lot further than
any of pnmatch(), stcpm() or stcpma()).

The drawback of all these wonderful functions is that they are hardly
standardized, so you loose if portability is at stake (and it is more
often than you'd hope for). For maximum portability, use the functions
that are in the ANSI draft, and create your own library for functions
like pnmatch() that aren't there (it is very easy to make a general
wildcard pattern matcher). You'll be grateful for this advice when
you switch to another compiler, or your vendor doesn't support this
neat little function in the next release, or you're porting to a
different system, or ...

    Leo.

sabbagh@acf3.NYU.EDU (sabbagh) (07/14/89)

In article <44672745.14a1f@gtephx.UUCP> covertr@gtephx.UUCP (Richard E. Covert) writes:
	[...Misc comments about string matching in C...]

>	But, I was browsing thru the MWC manual, and lo and behold I see
>pnmatch(). Now pnmatch is a wonderful little function which does string
>comparisons. And it even accepts wildcards, so I was in business. I just
>build an array of strings such as "*.PI1", and then by looping thru the
>array I can string compare a user inputted filename against the list of
>legal filetypes. Pretty neat solution.
>
>	So, the moral of this little ditty, is READ YOUR MANUAL!!
>
>P.S. Does anyone know if pnmatch() is implemented on other C compilers??

I can't answer the question about pnmatch, however, in the May or June issue
of Dr. Dobb's Journal there is an article entitled "Adding Awk-like ext-
ensions to C".  This gave a description and SOURCE CODE for C routines
that would perform AWK functions.  For those non-Eunuchs (:-)) users out
there, AWK is a "little" language whose primary usage is string processing.
These extensions are very powerful and worth looking into.

Hadil G. Sabbagh
E-mail:		sabbagh@csd27.nyu.edu
Voice:		(212) 998-3285
Snail:		Courant Institute of Math. Sci.
		251 Mercer St.
		New York,NY 10012

186,282 miles per second -- it's not just a good idea, it's the law!

scs@adam.pika.mit.edu (Steve Summit) (07/15/89)

In article <44672745.14a1f@gtephx.UUCP> covertr@gtephx.UUCP (Richard E. Covert) writes:
>P.S. Does anyone know if pnmatch() is implemented on other C compilers??

No vendor should provide a routine named "pnmatch."  Vendors are
not supposed to pollute the namespace with "convenient" (but
invariably unportable and system-specific) routines.  ("Then why
do so may vendors do so?" you ask.)  Vendor-supplied routines not
mentioned in the standard are supposed to have names beginning
with at least one underscore (e.g. "_pnmatch").

Similarly, portable programs cannot really use these extensions,
no matter how convenient they may be.  When extensions are used,
they should be hidden behind at least one function call; that is,
don't call pnmatch directly, but rather invent your own routine --
"match_filenames" or something, which calls pnmatch.  Then, when
you port your code to a different system that doesn't have
pnmatch or uses some wildly different wildcard mechanism, you
only have to rewrite match_filenames().

                                            Steve Summit
                                            scs@adam.pika.mit.edu

gwyn@smoke.BRL.MIL (Doug Gwyn) (07/15/89)

In article <12689@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>In article <44672745.14a1f@gtephx.UUCP> covertr@gtephx.UUCP (Richard E. Covert) writes:
>>P.S. Does anyone know if pnmatch() is implemented on other C compilers??
>No vendor should provide a routine named "pnmatch."

That's based on a misunderstanding.  The actual constraint is that a
vendor is not supposed to interfere with an application's having its
own function (or external variable, or whatever) named "pnmatch".
Such a vendor-supplied function can be included in the standard C
library if it is not invoked by any standard library routines and
if it is not declared in any standard header.

>Similarly, portable programs cannot really use these extensions,
>no matter how convenient they may be.

A portable program cannot rely on the existence of a vendor-specific
function such as "pnmatch", but only because it doesn't exist in some
environments -- no other reason.

scs@adam.pika.mit.edu (Steve Summit) (07/16/89)

In article <10533@smoke.BRL.MIL> gwyn@brl.arpa (Doug Gwyn) writes:
>In article <12689@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
>>No vendor should provide a routine named "pnmatch."
>That's based on a misunderstanding.  The actual constraint is that a
>vendor is not supposed to interfere with an application's having its
>own function (or external variable, or whatever) named "pnmatch".

Indeed.  I should know better than to post assertions about a
standard I've never actually read.

                                            Steve Summit
                                            scs@adam.pika.mit.edu

henry@utzoo.uucp (Henry Spencer) (07/16/89)

In article <1055@acf3.NYU.EDU> sabbagh@acf3.UUCP () writes:
>... in the May or June issue
>of Dr. Dobb's Journal there is an article entitled "Adding Awk-like ext-
>ensions to C".  This gave a description and SOURCE CODE for C routines
>that would perform AWK functions...

Or you can pick up my freely-redistributable regular-expression package
from the comp.sources.unix archives.  It's a complete implementation of
egrep/awk regular expressions and is highly portable.
-- 
$10 million equals 18 PM       |     Henry Spencer at U of Toronto Zoology
(Pentagon-Minutes). -Tom Neff  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

Bob.Stout@p6.f506.n106.z1.fidonet.org (Bob Stout) (07/17/89)

In an article of <13 Jul 89 22:30:56 GMT>, (Richard E. Covert) writes:

 >P.S. Does anyone know if pnmatch() is implemented on other C compilers??

I don't believe so, but it's available in many 3rd party libraries. For  
example, one library (for TC and ZTC) that I contributed to has a similar  
function I wrote called wildname() which does the same thing and optionally  
allows either or both DOS or Unix style match-all patterns (i.e. either "." or  
"*" in addition to "*.*"). For DOS programmers, it also offers smarter pattern  
matching since it understands embedded '*' characters. 

  But the real issue is whether vendors should sprinkle these tempting little  
non-standard functions in their "standard" libraries. It seems unlikely that  
the practice can be curtailed since it offers a marketing advantage to the  
uninitiated, but at least they should be identified as non-portable and  
isolated via separate header files, etc.

bright@Data-IO.COM (Walter Bright) (07/18/89)

In article <12689@bloom-beacon.MIT.EDU> scs@adam.pika.mit.edu (Steve Summit) writes:
<No vendor should provide a routine named "pnmatch."  Vendors are
<not supposed to pollute the namespace with "convenient" (but
<invariably unportable and system-specific) routines.  ("Then why
<do so may vendors do so?" you ask.)

I'll tell you why: because customers want them. Here's a transcript of a
not uncommon telephone call:

Customer:	Compiler X has 427 library functions. Your compiler has
		only 387 library functions. When are you going to fix that?
Me:		Which library functions does X have that we don't that you
		need?
Customer:	But there are more library functions with X, therefore
		X is better.

Some people tend to rate a compiler by:
1.	The number of library functions (What they are is irrelevant).
2.	The number of pages in the manual (Content is irrelevant).

If you disagree with this, pick up some magazine reviews of C compilers.
Though, to their credit, the reviews *have* improved on these points in
the last 2 years, though some are still impressed by the *heft* of the
package!

Compiler vendors have responded to this pressure by creating vast quantities
of library functions. A surprisingly large percentage of these are totally
trivial (< 10 instructions). For example, routines that merely interface
with BIOS functions. What's wrong with these functions are that:
1. They clutter up the library.
2. They clutter up the manual with descriptions that are essentially
   duplicated from the BIOS manual.
3. They foster an illusion of writing portable code.

Manual writers (and reviewers) ought to read Strunk and White. A good
reference manual should contain exactly sufficient words to describe it,
no more, no less. The bloat properly belongs in a physically separate tutorial.

covertr@gtephx.UUCP (Richard E. Covert) (07/18/89)

In article <10533@smoke.BRL.MIL>, gwyn@smoke.BRL.MIL (Doug Gwyn) writes:
> 
> A portable program cannot rely on the existence of a vendor-specific
> function such as "pnmatch", but only because it doesn't exist in some
> environments -- no other reason.

	This is an interesting stream of messages about pnmatch(). I first
posted it because I thought that MWC had written a clever little piece of
code. I have found out that pnmatch() once existed in Version 7 (MINIX is
Version 7) UNIX.

	Anyway, the talk is now centering around portability issues. I have been
programming for quite a few years and I realize just how important portability
is, BUT, a good software engineer decides when portability is not an issue, 
and doesn't always stick to the book.

	My application involves operating system and hardware specific issues
which make my program non-portable to other non-GEM computers anyway. Possibly,
I could port my program to an IBM PC running DRI GEMDOS, but it wouldn't have
all of the different types of picture files that the ST does anyway. So, I
maintain that portability is a moot point in my application. Furthermore,
my application is non-portable due to the fact that AES/VDI headers are not included
in the ANSI Draft. And I have a copy of the ANSI Draft for C at my desk at work
and at home.

	I did look up pnmatch() and it is not in the ANSI Draft C. But neither is the 
regexp() function mentioned elsewhere.

	In any case, I made the decision to use pnmatch() because it fulfills
my need, and it is unlikely that I will ever need to port this program. Also,
and finally, if portability is an issue, then you can purchase the source code to
the Mark Williams C compiler directly from Mark Williams Corp for $149.00.
And then you can use pnmatch() to your heart's content!!

P.S. Try porting the ST fsel_input() to some other computer!!! And people are
worried about pnmatch() not being portable :-).

Richard (just trying to write some GEMs) Covert

henry@utzoo.uucp (Henry Spencer) (07/19/89)

In article <447ec923.14a1f@gtephx.UUCP> covertr@gtephx.UUCP (Richard E. Covert) writes:
>...I have found out that pnmatch() once existed in Version 7 (MINIX is
>Version 7) UNIX.

Can you cite a reference for this?  It wasn't in the V7 that utzoo ran
until a year ago -- and our distribution tape came direct from Bell Labs.

>I did look up pnmatch() and it is not in the ANSI Draft C. But neither is the 
>regexp() function mentioned elsewhere.

However, something very much like my regexp functions is likely to appear
in Son of Posix (1003.2, that is -- standardizing commands turned out to
be a good time to make some additions to the libraries).
-- 
$10 million equals 18 PM       |     Henry Spencer at U of Toronto Zoology
(Pentagon-Minutes). -Tom Neff  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu