[net.bugs.v7] bug in strcmp+strncmp

henry@utzoo.UUCP (Henry Spencer) (10/21/83)

There is a long-standing but obscure bug in strcmp() and strncmp() in
(at least) V7 and 4.1BSD.  To discover it, try the following:

	main() {
		if (strcmp("a\203", "a") <= 0)
			printf("Oops.\n");
	}

Note that the two strings are equal up to the point where one of them
ends, therefore by definition of lexicographic ordering the longer one
is greater.  But strcmp() claims it's the lesser.  This "works" only
on machines where characters are signed.  The problem is obvious when
you inspect the code:  strcmp's computation of a return code takes a
shortcut that assumes that the end-of-string NUL collates low with
respect to any other character.  This is not true on a signed-char
machine.  To fix this, add the following before the routine:

	/*
	 * CHARBITS should be defined only if the compiler lacks "unsigned char".
	 * It should be a mask, e.g. 0377 for an 8-bit machine.
	 */
	#ifndef CHARBITS
	#	define	UNSCHAR(c)	((unsigned char)(c))
	#else
	#	define	UNSCHAR(c)	((c)&CHARBITS)
	#endif

Change the return at the end to:

	return(UNSCHAR(*s1) - UNSCHAR(*--s2));

And define CHARBITS for the compilation (say, -DCHARBITS=0377).  Then
make the same changes to strncmp(), which takes the same shortcut and
has the same bug.

Please don't try to tell me that the note in the BUGS section about
using the native character comparison excuses this.  The NUL is an
end-marker, not a regular character of the string.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry