[comp.lang.c] Comparing strings...

v8902058@cc.nu.oz.au (10/13/90)

Hello...

	I'm sorry if this is a trival task, but I would like a function that
compares two strings.  I would like to be able to find if a string is equal to,
less than, or greater than another string.  I tried writing a routinue myself,
but I just can't seem to do it, so I thought I'd ask.

	I wouldn't mind some code, or just the basic algorithm maybe.

	If anyone can help me, please email or post...

Thanks.

Bernard.

gordon@osiris.cso.uiuc.edu (John Gordon) (10/14/90)

	Hmmm... Well, here's what I came up with:

int compare_strings(char s1[], char s2[])
{
  int i;

  for(i = 0; s1[i] && s2[i]; i++)
  {
    if(s1[i] > s2[i])
      return(S1_BIGGER);
    else if(s1[i] < s2[i])
      return(S2_BIGGER)
  }

  /* if we got to here, s1 and s2 are either equal or equal up to a NULL */

  if(strlen(s1) > strlen(s2))
    return(S1_BIGGER);
  else if(strlen(s1) < strlen(s2))
    return(S2_BIGGER);
  else
    return(EQUAL); 
}

  If there are stupid syntactical errors in the code, I apologize.  I just
sat down and wrote it.


---
John Gordon
Internet: gordon@osiris.cso.uiuc.edu        #include <disclaimer.h>
          gordon@cerl.cecer.army.mil       #include <clever_saying.h>
GEnie:    j.gordon14

stanley@fozzie.pheonix.com (John Stanley) (10/14/90)

gordon@osiris.cso.uiuc.edu (John Gordon) writes:

> 
> 	Hmmm... Well, here's what I came up with:

   Did I miss something, or is everyone overlooking the library functions
strcmp and strncmp that are already provided to do this?

This is my signature. It doesn't contain my name at all!

pfalstad@phoenix.Princeton.EDU (Paul John Falstad) (10/14/90)

In article <qDwZq1w161w@fozzie.pheonix.com> stanley@fozzie.pheonix.com (John Stanley) writes:
>gordon@osiris.cso.uiuc.edu (John Gordon) writes:
>> 	Hmmm... Well, here's what I came up with:
>   Did I miss something, or is everyone overlooking the library functions
>strcmp and strncmp that are already provided to do this?

You're not missing anything.  You're just the only one who bothered to
post the obvious.

Here's a better implementation of strcmp, by the way: (not that it
matters, since it's in the library)

int strcmp(char *s,char *t)
{
	for (; *s && *s == *t; s++,t++);
	return *s-*t;
}

--
pfalstad@phoenix.princeton.edu   The Germans are disputing it! Hegel is
arguing that the reality is merely an a priori adjunct of non-absolutistic
ethics; Kant, by the categorical imperative, is holding it ontologically
exists only in the imagination; and Marx is claiming it was off sides.

ark@alice.att.com (Andrew Koenig) (10/14/90)

In article <2205.271700c2@cc.nu.oz.au>, v8902058@cc.nu.oz.au writes:

> 	I'm sorry if this is a trival task, but I would like a function that
> compares two strings.  I would like to be able to find if a string is equal to,
> less than, or greater than another string.

Usually I don't answer questions that look like homework assignments.
I can't resist this one, though, seeing that someone else has already
posted an answer and also that it gives a nice opportunity to explore
some programming techniques that some people may find unfamiliar.

The first thing to do is understand the problem thoroughly.  As stated,
we do not know what a string is, or what it means for one string to be
less than another.

First let's define a string as a null-terminated character array
represented by a pointer to the initial element of the array.  Note
that this definition precludes treating a null pointer as a string;
the program we write will therefore fail if either of its arguments
is a null pointer or if either of the strings is not null-terminated.

Next, we define an ordering relation on strings.  Two strings must
be equal, of course, if all their characters are equal.  What if
they aren't?  Then there are two cases:

	(a) one string is a prefix of the other.  In this case
	we will define the shorter string as smaller.

	(b) neither string is a prefix of the other.  In this case
	we can scan the strings left to right until we find a
	character in one string that is not equal to the corresponding
	character in the other; comparing these two characters
	will determine which string is smaller.

This definition corresponds to the usual notion of lexical ordering.
For example, the null string is the smallest string of all, "a" is
less than "aa", which in turn is less than "ab", and so on.

This definition has an important property: if you pick a character
and append that same character to the BEGINNING of each of two
strings, you will not affect how the strings compare.  So, for
example, because "a" is less than "aa", "xa" is less than "xaa".

This fact is easy to prove.  If the strings started out equal,
they will still be equal after the transformation.  If not, you
can go through both cases (a) and (b) above and easily convince
yourself that whichever one applies, adding the extra character
will not change the result.

That suggests the following algorithm:

	(1) If both strings are null, they're equal.

	(2) Otherwise, if one string is null and the other one
	    isn't, the null string is shorter.

	(3) Otherwise, both strings are non-null; if their initial
	    characters are unequal, the result of that comparison
	    is the result of the algorithm.

	(4) Otherwise, we chop the initial character off both strings
	    and recursively apply the algorithm on the remainders.

This is easy to translate into a C program.  We assume the constants
LESS, EQUAL, and GREATER are defined elsewhere.  We return LESS if
the first argument is less than the second, GREATER if the first argument
is greater, and EQUAL if they're equal:

	int compare (char *p, char *q)
	{
		if (*p == '\0' && *q == '\0')
			return EQUAL;
		if (*p == '\0' && *q != '\0')
			return LESS;
		if (*p != '\0' && *q == '\0')
			return GREATER;
		if (*p < *q)
			return LESS;
		if (*p > *q)
			return GREATER;
		return compare(p+1, q+1);
	}

We now have a program that accurately reflects the definition, and
that may be good enough!  However, let's assume that we want the
program to run faster.  A dangerous assumption, but it's a wretched,
miserable, rainy night out and I want to have a little fun.

First, let's look at the first two comparisons.  We're comparing
p to '\0' twice.  That's silly; we already know the answer from
the first time.  Let's factor those two comparisons:

		if (*p == '\0') {
			if (*q == '\0')
				return EQUAL;
			return LESS;
		}
		/* ... */

Next we look at the third comparison and realize we know half the
answer to that two: we can only get there if *p != '\0'.  The first
three comparisons now look like this:

		if (*p == '\0') {
			if (*q == '\0')
				return EQUAL;
			return LESS;
		}
		if (*q == '\0')
			return GREATER;

The next trick is to eliminate the recursive function call.  Because
the function call is the last thing in our function, we can simply
replace it by a set of assignments to establish the right initial
conditions followed by a goto up to the top of the function!  Some
C compilers will do it for us, but it's more fun this way:

	int compare (char *p, char *q)
	{
	top:	if (*p == '\0') {
			if (*q == '\0')
				return EQUAL;
			return LESS;
		}
		if (*q == '\0')
			return GREATER;
		if (*p < *q)
			return LESS;
		if (*p > *q)
			return GREATER;
		p++;
		q++;
		goto top;
	}

Nobody likes goto statements these days, so let's replace this
one by a `while' loop.  It's easy to do this by moving the body
of the first `if' outside the loop and making *p == '\0' the
subject of the loop.  Of course, we have to remember to invert
the sense of this test.  While we're at it, we can swap the order
of the to comparisons of *p and *q to merge the two tests
that might result in returning GREATER:

	int compare (char *p, char *q)
	{
		while (*p != '\0') {
			if (*q == '\0' || *p > *q)
				return GREATER;
			if (*p < *q)
				return LESS;
			p++;
			q++;
		}
		if (*q == '\0')
			return EQUAL;
		return LESS;
	}

We can take advantage of the fact that if we use an integral
value by itself in the comparison, that implicitly compares it
to zero (which is identical to '\0').  That shortens the program
slightly.  We will also make p and q into register variables, and
finally note that the comparison `*p > *q' can be merged with the
increment operations on p and q (because if the `return' is
executed, the values of p and q no longer matter).  This makes the
program significantly faster on some machine architectures and
compilers:

	int compare (register char *p, register char *q)
	{
		while (*p) {
			if (!*q || *p > *q)
				return GREATER;
			if (*p++ < *q++)
				return LESS;
		}
		if (!*q)
			return EQUAL;
		return LESS;
	}

Finally, we can merge the two `return' statements at the end into
a single ?: operator:

	int compare (register char *p, register char *q)
	{
		while (*p) {
			if (!*q || *p > *q)
				return GREATER;
			if (*p++ < *q++)
				return LESS;
		}
		return *q? LESS: EQUAL;
	}

By this time, the program should be thoroughly incomprehensible,
even though it was obtained by a series of straightforward steps
from the original problem definition!  As an exercise, you might
try proving that the program works in its current form, or finding
out where I goofed if it doesn't work.

In `Structured Programming with Goto Statements,' Don Knuth said
he hoped for a system that he would tentatively name UTOPIA84
(because he thought someone might be able to get it done by
1984 -- this article was published in 1974) that would allow
people to perform these kinds of program transformations mechanically
while having the system check that correctness was being maintained.
He didn't want the system to do these things automatically --
he felt (and I agree) that it is only worth doing this kind of
optimization after you've measured a working program to see that
it is actually worthwhile.

I imagine that further optimizations are possible.  Anyone else
want to give it a try?
-- 
				--Andrew Koenig
				  ark@europa.att.com

henry@zoo.toronto.edu (Henry Spencer) (10/14/90)

In article <11485@alice.att.com> ark@alice.att.com (Andrew Koenig) writes:
>	int compare (register char *p, register char *q)
>	{
>		while (*p) {
>			if (!*q || *p > *q)
>				return GREATER;
>			if (*p++ < *q++)
>				return LESS;
>		}
>		return *q? LESS: EQUAL;
>	}
>
>By this time, the program should be thoroughly incomprehensible...

Some (for example, me) would say that this was aided by your reliance on
implicit comparisons to zero, which hurt readability and do nothing for
your stated goal of greater efficiency.  Why did you bother doing that?

>I imagine that further optimizations are possible...

Definitely.  For one thing, when coding seriously-tight versions of such
things, we never make one comparison after another *inside* a loop to
decide *how* to exit it; we use the shortest possible test to decide
*whether* to exit it, so that the go-around-again path is as quick as
possible.  Examining the various tests in your loop together, the loop
continuation condition is simply `*p == *q'.  So we get (putting back
the explicit comparison while I'm at it):

	while (*p != '\0' && *p == *q)
		p++, q++;

However, we are still doing *p twice, which costs us on many machines
unless our compiler is smart enough to do common-subexpression merging.
We can do this explicitly, at the cost of declaring another (register!)
variable:

	while ((c = *p) != '\0' && c == *q)
		p++, q++;

The one obvious trick we haven't applied to this is merging the increment
operators with the fetches, as in your earlier code.  That actually can't
be done in my first loop because of problems in the after-loop code (which
I am omitting because it's 0530 and I want to wrap this up), but is
viable in the second one:

	while ((c = *p++) != '\0' && c == *q++)
		continue;

(the `continue' being present for readability, normally at no cost to
efficiency).  This is about as greased up as it's going to get without
a radical change in approach.  The after-loop code will need to check
`c' to find out which condition terminated the loop, and will need to
back up pointers and make the detailed comparisons to decide what value
to return.  (If one were tasteless enough to use a goto :-), the `c'
re-check could be avoided, but since it's outside the loop the efficiency
gain is slight.)  Completing it is left as an exercise to the reader.

A further optimization, details left as another optional exercise, is
to define LESS, EQUAL, and GREATER in terms of sign rather than precise
values, and use character subtraction to derive a suitably-signed value,
avoiding a series of tests that slow most machines down somewhat and
do bad things to pipelined machines, which really hate conditional
branches.  For extra credit, discuss how current implementations by
both Berkeley and AT&T attempt this optimization and thereby introduce
a bug (hint:  consider the fact that char is signed on most machines).

Project for the really determined reader:  I said things weren't going
to get much faster without a radical change in approach.  Find one.
(Yes, it exists.)  Discuss tradeoffs and assumptions.
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry

ark@alice.att.com (Andrew Koenig) (10/14/90)

In article <3330@idunno.Princeton.EDU>, pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:

> int strcmp(char *s,char *t)
> {
> 	for (; *s && *s == *t; s++,t++);
> 	return *s-*t;
> }

Whether or not this works depends on your definition of the problem.
If you really want to follow the normal lexical convention that
the null string compares <= anything else, and you're on a machine
on which chars can be negative, then it doesn't work.
-- 
				--Andrew Koenig
				  ark@europa.att.com

ark@alice.att.com (Andrew Koenig) (10/14/90)

In article <1990Oct14.095336.2819@zoo.toronto.edu>, henry@zoo.toronto.edu (Henry Spencer) writes:

> Some (for example, me) would say that this was aided by your reliance on
> implicit comparisons to zero, which hurt readability and do nothing for
> your stated goal of greater efficiency.  Why did you bother doing that?

Because for me it increases readability.

> >I imagine that further optimizations are possible...

> Definitely. ...
> [optimization description deleted]

Neat!  And you haven't even unrolled any loops!

> For extra credit, discuss how current implementations by
> both Berkeley and AT&T attempt this optimization and thereby introduce
> a bug (hint:  consider the fact that char is signed on most machines).

I'm afraid I may have given it away in another posting, which I made
before I saw this one.
-- 
				--Andrew Koenig
				  ark@europa.att.com

pfalstad@drops.Princeton.EDU (Paul John Falstad) (10/15/90)

In article <11486@alice.att.com> ark@alice.att.com (Andrew Koenig) writes:
>In article <3330@idunno.Princeton.EDU>, pfalstad@phoenix.Princeton.EDU (Paul John Falstad) writes:
>> int strcmp(char *s,char *t)
>> {
>> 	for (; *s && *s == *t; s++,t++);
>> 	return *s-*t;
>> }
>Whether or not this works depends on your definition of the problem.
>If you really want to follow the normal lexical convention that
>the null string compares <= anything else, and you're on a machine
>on which chars can be negative, then it doesn't work.

OK.  Declare the chars unsigned then.  (I copied this with minor changes
from K&R2, so I assumed it would be right.)

Other optimizations could be made:

int strcmp(char *s,char *t)  /* declared register if you like */
{
unsigned char c,d;

	while((c = (unsigned char) *s++) == (d == (unsigned char) *t++) && c);
	return c-d;
}

I've heard of faster ways that take advantage of parallelism by
comparing a word or longword at a time.  Can't remember if it was
worth it or not.

--
pfalstad@phoenix.princeton.edu   The Germans are disputing it! Hegel is
arguing that the reality is merely an a priori adjunct of non-absolutistic
ethics; Kant, by the categorical imperative, is holding it ontologically
exists only in the imagination; and Marx is claiming it was off sides.

henry@zoo.toronto.edu (Henry Spencer) (10/15/90)

In article <11488@alice.att.com> ark@alice.att.com (Andrew Koenig) writes:
>> >I imagine that further optimizations are possible...
>
>> Definitely. ...
>> [optimization description deleted]
>
>Neat!  And you haven't even unrolled any loops!

If I find time in the next few days, I may post a followup going further!
It has occurred to me that if you look at it from the right angle, even
the radically-different approach I alluded to can be derived by source
transformations.  (The required angle is about 137 + 56i degrees. :-))
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/15/90)

In article <1990Oct13.190106.15615@ux1.cso.uiuc.edu>, gordon@osiris.cso.uiuc.edu (John Gordon) writes:
> 	Hmmm... Well, here's what I came up with:

Er, what happened to strcmp()?  Or did the original poster mean by "string"
something other than "pointer to NUL-terminated array of char"?

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (10/15/90)

In article <2205.271700c2@cc.nu.oz.au>, v8902058@cc.nu.oz.au writes:
>
>
>Hello...
>
>        I'm sorry if this is a trival task, but I would like a function that
>compares two strings.  I would like to be able to find if a string is equal to,
>less than, or greater than another string.  I tried writing a routinue myself,
>but I just can't seem to do it, so I thought I'd ask.
>
>        I wouldn't mind some code, or just the basic algorithm maybe.
>
>        If anyone can help me, please email or post...
>
>Thanks.
>
>Bernard.
 
If you have a C compiler, you probably already have such a
function; just #include <string.h> and then use routines
such as strcmp, stricmp, and strncmp as probably defined
in the manual.
 
-------------------+-------------------------------------------
Al Dunbar          |
Edmonton, Alberta  |   this space for rent
CANADA             |
-------------------+-------------------------------------------

henry@zoo.toronto.edu (Henry Spencer) (10/15/90)

In article <3343@idunno.Princeton.EDU> pfalstad@drops.Princeton.EDU (Paul John Falstad) writes:
>	while((c = (unsigned char) *s++) == (d == (unsigned char) *t++) && c);

Uh-uh, you don't want to do the conversions inside the loop; they can add
significant expense, and are irrelevant to an equality comparison.  You
do the conversions -- and conversions *back* to int so the difference
is signed -- in the post-loop wrapup.

By the way, tacking the ";" on the end of the while is a great way to
trip up your readers, which is thought amusing by amateurs and is avoided
by professionals.
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry

stelmack@screamer.csee.usf.edu (Gregory M. Stelmack) (10/15/90)

One reason why you might want your own code is if you need a specific string
comparison that your library does not have. Including specific string
comparison functions may make your code non-portable, as it did with a chunk
of code I wrote. SAS/C for the Amiga has a nice stricmp() function which does
case-insensitive comparisons (A==a). A friend of mine had it bomb on compile
with Manx, and had to change this call to another. So, I wrote my own string
compare to do what I want.

Just had to show a situation in which you may want your own code...

-- Greg Stelmack
-- Email: stelmack@sol.csee.usf.edu
-- USmail: USF Box 1510, Tampa, FL 33620-1510
-- Amiga: the only way to compute!
-- IRAQNOPHOBIA: Nothing a little RAID wouldn't cure!

ark@alice.att.com (Andrew Koenig) (10/15/90)

After seeing all the comments about comparing strings,
I couldn't resist adding yet another solution to the pile.

Here it is.  It's not completely portable, in that it depends
on the `popen' library function.  Some operating systems may
not have this particular function.  It also depends on certain
other characteristics of the host machine that should become
clear after you read it.

For the obvious reasons, I have not tested this program completely.


------------------------------------
#include <stdio.h>

extern FILE *popen(const char *, const char *);

void printhex(const char *, FILE *);

/*
 * This function compares two strings by sending their hex
 * representations to comp.lang.c and letting responses come
 * back to the user.
 */

void compare(const char *p, const char *q)
{
	FILE *fd = popen("EDITOR=ed postnews", "w");
	
	/* Is this message in response to some other message? */
	fprintf(fd, "n\n");

	/* Subject: */
	fprintf(fd, "string comparison request\n");

	/* Keywords: */
	fprintf(fd, "\n");

	/* Newsgroups (enter one at a time, end with a blank line): */

	/* The most relevant newsgroup should be the first, you should */
	/* add others only if your article really MUST be read by people */
	/* who choose not to read the appropriate group for your article. */
	/* But DO use multiple newsgroups rather than posting many times. */

	/* For a list of newsgroups, type ? */
	fprintf(fd, "comp.lang.c\n\n");

	/* Distribution (default='comp', '?' for help) : */
	fprintf(fd, "\n");

	/* We're now inside `ed'.  First explain the purpose of this */
	/* message, then call our hex function to print */
	/* each string in hex. */

	fprintf(fd, "Which of the following two strings is greater?\n");
	fprintf(fd, "Please answer by email and I will summarize for the net.\n");
	fprintf(fd, "\n");

	fprintf(fd, "String 1:\n");
	printhex(p, fd);
	fprintf(fd, "\n\nString 2:\n");
	printhex(q, fd);

	/* Now get out of ed */
	fprintf(fd, "\n.\nw\nq\n");

	/* What now?  [send, edit, list, quit, write] */
	fprintf(fd, "s\n");

	fclose(fd); sleep(2);
}

/* Print the string in hex, 32 chars (64 hex digits) to a line */
/* The first character, if any, is preceded by a newline. */
void printhex(const char *p, FILE* fd)
{
	unsigned count = 0;
	static char hextab[] = "0123456789abcdef";

	while (*p) {
		if (count % 32 == 0)
			putc('\n', fd);
		count++;
		putc(hextab[(*p & 0xf0) >> 4], fd);
		putc(hextab[*p & 0x0f], fd);
		p++;
	}
}
------------------------------------
-- 
				--Andrew Koenig
				  ark@europa.att.com

pfalstad@dry.Princeton.EDU (Paul John Falstad) (10/15/90)

In article <1990Oct15.042851.18595@zoo.toronto.edu> henry@zoo.toronto.edu (Henry Spencer) writes:
>In article <3343@idunno.Princeton.EDU> pfalstad@drops.Princeton.EDU (Paul John Falstad) writes:
>>	while((c = (unsigned char) *s++) == (d == (unsigned char) *t++) && c);
>Uh-uh, you don't want to do the conversions inside the loop; they can add
>significant expense, and are irrelevant to an equality comparison.  You
>do the conversions -- and conversions *back* to int so the difference
>is signed -- in the post-loop wrapup.

You're right, I forgot to convert back to int; but no, it doesn't
matter where you do the `conversions,' as you call them, though that
word implies that the compiler is generating instructions to actually
convert the data from signed to unsigned.  What it is really doing is
generating different instructions to handle c and d as unsigned chars.
In fact, I tried both ways, compiling with gcc -O -S; doing the
conversions outside the loop actually generated more instructions on my
SPARCstation.  Here's a diff:

12c12,13
< 	and %l0,0xff,%o5
---
> 	sll %l0,0x18,%o5
> 	sra %o5,0x18,%o5
15,16c16,18
< 	and %o4,0xff,%o1
< 	ldub [%o2],%o0
---
> 	sll %o4,0x18,%o1
> 	sra %o1,0x18,%o1
> 	ldsb [%o2],%o0

When I did the conversion inside the loop, gcc just generated an and
instruction to mask out the upper 24 bits, but when I did the conversion
outside the loop, gcc had to sign-extend the byte with a
shift-left-logical and a shift-right-arithmetic.

What I should have done was to declare s and t as pointers to
unsigned chars (since I hate signed chars), but then the compiler would
have complained, since the string functions are usually prototyped with
pointers to signed chars.

>By the way, tacking the ";" on the end of the while is a great way to
>trip up your readers, which is thought amusing by amateurs and is avoided
>by professionals.

I'm sorry if it confused you; I'm quite used to it.  I think the
indentation makes my intent clear in most cases.  We amateurs have to
get our amusement somehow.

--
Paul Falstad, pfalstad@phoenix.princeton.edu PLink:HYPNOS GEnie:P.FALSTAD
"And she's always on about men following her.  I don't know what she
thinks they're going to do to her.  Vomit on her, Basil, says."-Flowery Twats

gordon@osiris.cso.uiuc.edu (John Gordon) (10/16/90)

	I will agree, strcmp() is fine.  However, the original poster said
that they wanted SOURCE, so I wrote the routine.

henry@zoo.toronto.edu (Henry Spencer) (10/16/90)

In article <3353@idunno.Princeton.EDU> pfalstad@dry.Princeton.EDU (Paul John Falstad) writes:
>You're right, I forgot to convert back to int; but no, it doesn't
>matter where you do the `conversions,' as you call them, though that
>word implies that the compiler is generating instructions to actually
>convert the data from signed to unsigned.  What it is really doing is
>generating different instructions to handle c and d as unsigned chars.

On *some* machines, it generates different instructions, and if you're
lucky, there is no speed difference.  But on many machines, there is
only one kind of character-access instruction, and if what you ask
for isn't what it delivers, then yes, the compiler really does have
to generate instructions to do the conversion.  Since signed-char
machines considerably outnumber unsigned-char machines, and only the
latest architectures give you a choice, the odds are pretty good that
those conversions to unsigned will cost you.

The reason why there is a "char" type, with no promises made about whether
it is signed or unsigned, is precisely that making the right choice
*makes a big difference to speed* on a lot of machines.

>In fact, I tried both ways, compiling with gcc -O -S; doing the
>conversions outside the loop actually generated more instructions on my
>SPARCstation...

I'm afraid this sounds like you are either a believer in "proof by example"
or a victim of Sun propaganda.  Most machines are not SPARCs.
-- 
"...the i860 is a wonderful source     | Henry Spencer at U of Toronto Zoology
of thesis topics."    --Preston Briggs |  henry@zoo.toronto.edu   utzoo!henry

cdm@gem-hy.Berkeley.EDU (Dale Cook) (10/16/90)

In article <2205.271700c2@cc.nu.oz.au>, v8902058@cc.nu.oz.au writes:

|> Hello...
|> 
|> 	I'm sorry if this is a trival task, but I would like a function that
|> compares two strings.  I would like to be able to find if a string is
equal to,
|> less than, or greater than another string.  I tried writing a
routinue myself,
|> but I just can't seem to do it, so I thought I'd ask.
|> 
|> 	I wouldn't mind some code, or just the basic algorithm maybe.
|> 
|> 	If anyone can help me, please email or post...
|> 
|> Thanks.
|> 
|> Bernard.

Try "strcmp".

--- Dale Cook

Neither the United States Government nor the Idaho National
Engineering Laboratory nor any of their employees, makes any warranty,
expressed or implied, or assumes any legal liability or responsibility
for the accuracy, completeness, or usefulness of any information,
product, or process disclosed, or represents that its use would not
infringe privately owned rights.  Reference herein to any specific
commercial products, process, or service by trade name, trademark
manufacturer, or otherwise, does not necessarily constitute or imply
its endorsement, recommendation, or favoring by the United States
Government or the Idaho National Engineering Laboratory.  The views and
opinions of authors expressed herein do not necessarily state or reflect
those of the United States Government nor the Idaho National Engineering
Laboratory, and shall not be used for advertising or product endorsement
purposes.

twpierce@amherst.bitnet (Tim Pierce) (10/16/90)

In article <1990Oct13.190106.15615@ux1.cso.uiuc.edu>,
gordon@osiris.cso.uiuc.edu (John Gordon) writes:

> 	Hmmm... Well, here's what I came up with:
> 
> int compare_strings(char s1[], char s2[])
> {
>   int i;
> 
>   for(i = 0; s1[i] && s2[i]; i++)
>   {
>     if(s1[i] > s2[i])
>       return(S1_BIGGER);
>     else if(s1[i] < s2[i])
>       return(S2_BIGGER)
>   }

Correct me if I'm wrong here (being a C neophyte myself), but shouldn't you be
using a while loop rather than a for loop? It looks to me as if this code will
return only the relative sizes of the last characters in the string, not
stopping once an inequality is hit. Or does the return() statement imply a
break?

----------------------------------------------------------------------------
Tim Pierce		| "They call television a medium. That is because
twpierce@amherst.bitnet	|  it is neither rare nor well done." - Ernie Kovacs
----------------------------------------------------------------------------

gordon@osiris.cso.uiuc.edu (John Gordon) (10/17/90)

	Yes, the return() statement (keyword?) means that execution of the
current function is stopped, and control passes to the calling procedure 
(pending evaluation of return()'s argument, of course).

gordon@osiris.cso.uiuc.edu (John Gordon) (10/17/90)

	Also, one other thing: for() and while() loops are essentially 
identical.  The following loops are exactly the same:

	#1)
 	    for(i = 0; i < 100; i++)
	    {
		body of loop
	    }


	#2)
	   i = 0;
	   while(i < 100)
	   {
	     body of loop
	     i++;
	   }


	Hope it helps.

mcdaniel@adi.com (Tim McDaniel) (10/17/90)

gordon@osiris.cso.uiuc.edu (John Gordon) writes:

> Also, one other thing: for() and while() loops are essentially
> identical.

"Essentially identical" is not identical to "identical".  As Dad says,
"'Close' only counts in horseshoes and hand grenades".

> The following loops are exactly the same:

Unless "continue" is used: in a "while" loop, control passes
immediately to the conditional expression, but in a "for" loop, the
third control expression is done before it goes to the conditional
expression.  So

   for (i = 0; i < 100; i++) {
      <code #1>
         continue;
      <code #2>
   }

is identical to

   i = 0;
   while (i < 100) {
      <code #1>
         goto uniqueLabel;
      <code #2>
   uniqueLabel:
      i++;
   }

To the best of my knowledge, they are otherwise identical.
--
Tim McDaniel                 Applied Dynamics Int'l.; Ann Arbor, Michigan, USA
Work phone: +1 313 973 1300                        Home phone: +1 313 677 4386
Internet: mcdaniel@adi.com                UUCP: {uunet,sharkey}!amara!mcdaniel

userAKDU@mts.ucs.UAlberta.CA (Al Dunbar) (10/18/90)

In article <10678.271ade27@amherst.bitnet>, twpierce@amherst.bitnet (Tim Pierce) writes:
>In article <1990Oct13.190106.15615@ux1.cso.uiuc.edu>,
>gordon@osiris.cso.uiuc.edu (John Gordon) writes:
>
<<<deletions>>>
>>   for(i = 0; s1[i] && s2[i]; i++)
>>   {
>>     if(s1[i] > s2[i])
>>       return(S1_BIGGER);
>>     else if(s1[i] < s2[i])
>>       return(S2_BIGGER)
>>   }
>
>Correct me if I'm wrong here (being a C neophyte myself), but shouldn't you be
>using a while loop rather than a for loop? It looks to me as if this code will
>return only the relative sizes of the last characters in the string, not
>stopping once an inequality is hit. Or does the return() statement imply a
>break?
>
The return() statement (or just "return;") "implies a break"
in the same way that going outside implies that you have left
the room.
 
-------------------+-------------------------------------------
Al Dunbar          |
Edmonton, Alberta  |   this space for rent
CANADA             |
-------------------+-------------------------------------------
#! r

peter@ficc.ferranti.com (Peter da Silva) (10/18/90)

In article <11491@alice.att.com> ark@alice.att.com (Andrew Koenig) writes:
> 	/* We're now inside `ed'.  First explain the purpose of this */
> 	/* message, then call our hex function to print */
> 	/* each string in hex. */
> 
> 	fprintf(fd, "Which of the following two strings is greater?\n");

You forgot "$a" to go into insert mode.

You also forgot to cross-post to talk.bizzarre.
-- 
Peter da Silva.   `-_-'
+1 713 274 5180.   'U`
peter@ferranti.com

richard@iesd.auc.dk (Richard Flamsholt S0rensen) (10/18/90)

In article <1990Oct16.012056.17200@inel.gov> cdm@gem-hy.Berkeley.EDU (Dale Cook) writes:
> |> [stuff deleted]
>
> Try "strcmp".
>
> --- Dale Cook
>
> Neither the United States Government nor the Idaho National
> Engineering Laboratory nor any of their employees, makes any warranty,
> expressed or implied, or assumes any legal liability or responsibility
> for the accuracy, completeness, or usefulness of any information,
> product, or process disclosed, or represents that its use would not
> infringe privately owned rights.  Reference herein to any specific
> commercial products, process, or service by trade name, trademark
> manufacturer, or otherwise, does not necessarily constitute or imply
> its endorsement, recommendation, or favoring by the United States
> Government or the Idaho National Engineering Laboratory.  The views and
> opinions of authors expressed herein do not necessarily state or reflect
> those of the United States Government nor the Idaho National Engineering
> Laboratory, and shall not be used for advertising or product endorsement
> purposes.

  Eeehhmm - do you think, Dale, that you could persuade the Idaho
National Engineering Laboratory to loosen up a bit on this overkill of
a .signature ?
  Especially when all you want to say is 'Try "strcmp"' ... (no
personal offense.)

  -Richard

--
/Richard Flamsholt
richard@iesd.auc.dk

mat@mole-end.UUCP (Mark A Terribile) (10/21/90)

> > int strcmp(char *s,char *t)
> > {
> > 	for (; *s && *s == *t; s++,t++);
> > 	return *s-*t;
> > }

> 				--Andrew Koenig
> 				  ark@europa.att.com

Note the semicolon on the  for(;;) .  Note that it's hard to note.

Another article on this topic replaced a null statement for a loop body
with a  continue :

	while( . . . . . )
		continue;

IMnsHO, the semicolon null statement is one of the more irritating glitches
in C.  I prefer

	for( ; *s && *s == *t; s++, t++ )
		{}

or

	while( . . . )
		{}

It's quite clear that this is an empty, no-effect statement.  It's a brace-
thing where you expect a brace-thing, and it won't get missed on a faint
printout the way a semicolon can be.
-- 

 (This man's opinions are his own.)
 From mole-end				Mark Terribile

bhoughto@cmdnfs.intel.com (Blair P. Houghton) (10/23/90)

In article <444@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes:
Andrew Koenig (ark@europa.att.com) writes
>> > 	for (; *s && *s == *t; s++,t++);
>
>Note the semicolon on the  for(;;) .  Note that it's hard to note.
>
[Mark prefers]
>	while( . . . )
>		{}
>
>It's quite clear that this is an empty, no-effect statement.  It's a brace-
>thing where you expect a brace-thing, and it won't get missed on a faint
>printout the way a semicolon can be.

But Mark, I, and the Indian Hills cats, and cb(1)[*],
all expect our braces attached to our pants thusly:

	while (...) {
	}

which smells of something forgotten (or preempted).

Much more exact:

	while (...)
	    ;	/* empty */

You'll never, ever misconstrue that sucker as an error-in-editing.

				--Blair
				  "'Dave?  Dave?
				   Scratch lower, Dave...'
				   -if HAL9000 had won..."

[*] Actually, vanilla cb(1) will leave		while(){
						}

alone, while it'll split the braces in		while()
						    {}

and move them back under the initial:		while()
						{
						}

ttobler@unislc.uucp (Trent Tobler) (10/24/90)

From article <1990Oct17.030157.460@ux1.cso.uiuc.edu>, by gordon@osiris.cso.uiuc.edu (John Gordon):
> 
> 	Also, one other thing: for() and while() loops are essentially 
> identical.  The following loops are exactly the same:
> 
> 	#1)
>  	    for(i = 0; i < 100; i++)
> 	    {
> 		body of loop
> 	    }
> 
> 
> 	#2)
> 	   i = 0;
> 	   while(i < 100)
> 	   {
> 	     body of loop
> 	     i++;
> 	   }
> 
> 
> 	Hope it helps.

Please note that for() and while() are identical, except when using the
continue within the body of the loop.  In the while() statement, the
i++ is not performed whereas in the for() statement, it is.


    Trent Tobler

floyd@hayes.ims.alaska.edu (Floyd Davidson) (10/24/90)

In article <444@mole-end.UUCP> mat@mole-end.UUCP (Mark A Terribile) writes:
>IMnsHO, the semicolon null statement is one of the more irritating glitches
>in C.  I prefer
>
>	for( ; *s && *s == *t; s++, t++ )
>		{}
>
>or
>
>	while( . . . )
>		{}
>
>It's quite clear that this is an empty, no-effect statement.  It's a brace-
>thing where you expect a brace-thing, and it won't get missed on a faint
>printout the way a semicolon can be.

A simple alternative:

	for ( ... ) {
		/* null statement */;
	}

Reformat to whatever style of indents or braces pleases your taste.

Floyd


-- 
Floyd L. Davidson      floyd@hayes.ims.alaska.edu
8347 Richardson Hwy.   floydd@chinet.chi.il.us
Salcha, AK 99714       [and related to Alascom, Inc. by a pay check, only]

trt@rti.rti.org (Thomas Truscott) (10/25/90)

Some years ago, while working on a fast DES implementation,
I discovered that:

	int i;
	unsigned char table[8];

	i = table[0];

ran faster than the obvious "char table[8]" on VAX, Gould,
MC680X0, and Convex.  I was rather surprised by that.
And it was as fast as "char table[8]" on all
the other machines I tried (e.g. MIPS, SPARC).

As a practical :-) example of this, the fastest VAX bitcount reported
by Chris Torek can be sped up 30% just by declaring nbits[] unsigned.
A strcmp() routine might take advantage of this with:
    while ((c = *(unsigned char *)p) != '\0' && c == *(unsigned char *)q)
Of course one must be careful about lexical ordering.

Perhaps there are exceptions, but I did not encounter any.
I am sure the PDP-11 would be an exception, but there wasn't one handy!
	Tom Truscott