[comp.lang.c] Finding NULL byte in a long

djs@nimbus3.uucp (Doug) (12/05/90)

I know this has been on the net before, since I thought I saved it,
but can't find it now.  Anyway, could someone tell me what the
C expression is that tells you if a long has a NULL byte in it.
This is without masking each byte and testing it for 0.  It is very
clever and non-obvious.  Thanks.


					Doug
-- 
Doug Scofea   Email: nimbus3!djs@cis.ohio-state.edu    Phone:+1 614 459-1889

bright@nazgul.UUCP (Walter Bright) (12/11/90)

In article <1990Dec5.033206.10463@nimbus3.uucp> djs@nimbus3.UUCP (Doug) writes:
/I know this has been on the net before, since I thought I saved it,
/but can't find it now.  Anyway, could someone tell me what the
/C expression is that tells you if a long has a NULL byte in it.
/This is without masking each byte and testing it for 0.  It is very
/clever and non-obvious.  Thanks.

Hmmm, how about:
	p = memchr(&x,0,sizeof(x));

wirzeniu@cs.Helsinki.FI (Lars Wirzenius) (12/11/90)

In article <198@nazgul.UUCP> bright@nazgul.UUCP (Walter Bright) writes:
>In article <1990Dec5.033206.10463@nimbus3.uucp> djs@nimbus3.UUCP (Doug) writes:
>/I know this has been on the net before, since I thought I saved it,
>/but can't find it now.  Anyway, could someone tell me what the
>/C expression is that tells you if a long has a NULL byte in it.
>/This is without masking each byte and testing it for 0.  It is very
>/clever and non-obvious.  Thanks.
>
>Hmmm, how about:
>	p = memchr(&x,0,sizeof(x));

How about (assuming sizeof(x) == 4, easily modified for other sizes):

	((char *) &x)[0] == 0 ||
	((char *) &x)[1] == 0 ||
	((char *) &x)[2] == 0 ||
	((char *) &x)[3] == 0

Lars Wirzenius    wirzeniu@cs.helsinki.fi    wirzenius@cc.helsinki.fi

hilfingr@rama.cs.cornell.edu (Paul N. Hilfinger) (12/12/90)

In article <1990Dec5.033206.10463@nimbus3.uucp> djs@nimbus3.UUCP (Doug) writes:
>
>I know this has been on the net before, since I thought I saved it,
>but can't find it now.  Anyway, could someone tell me what the
>C expression is that tells you if a long has a NULL byte in it.
>This is without masking each byte and testing it for 0.  It is very
>clever and non-obvious.  Thanks.

Normally, I'd reply by e-mail, but I can't believe that the formulae below
are the best one can do, and I'd like to hear of better solutions
myself.

Assuming 32-bit, 2's complement longs, I believe the following
expressions both have the property of being non-zero iff the long
value x contains at least one 0 byte:

	(~x & 0x7f7f7f7f) + 0x01010101 & ~x & 0x80808080
	((x - 0x01010101) ^ x) & ~x & 0x80808080

Paul Hilfinger

pmk@craycos.com (Peter Klausler) (12/13/90)

In article <198@nazgul.UUCP> bright@nazgul.UUCP (Walter Bright) writes:
>In article <1990Dec5.033206.10463@nimbus3.uucp> djs@nimbus3.UUCP (Doug) writes:
>/I know this has been on the net before, since I thought I saved it,
>/but can't find it now.  Anyway, could someone tell me what the
>/C expression is that tells you if a long has a NULL byte in it.
>/This is without masking each byte and testing it for 0.  It is very
>/clever and non-obvious.  Thanks.
>
>Hmmm, how about:
>	p = memchr(&x,0,sizeof(x));

Performance, however, is sometimes important.

If one can be unportable and depend on two's-complement integer arithmetic,
eight-bit bytes, and the size of a long being, say, 64, give this a try:

	#define UPPERS	   0x8080808080808080
	#define ONES	   0x0101010101010101
	#define	HASNULL(x) (~((((x) | UPPERS) - ONES) | (x)) & UPPERS)

The solution for a 32-bit microcomputer is similar.

Finding the *position* of the null byte is a little trickier. It helps a lot
if your machine has a "leading zero count"/"find first set bit" instruction
and your compiler lets you use it.

tony@mantis.co.uk (Tony Lezard) (12/14/90)

pmk@craycos.com (Peter Klausler) writes:

>Finding the *position* of the null byte is a little trickier. It helps a lot
>if your machine has a "leading zero count"/"find first set bit" instruction
>and your compiler lets you use it.

Assuming you're using 2's compliment arithmetic, there's always a "find
first set bit" instruction. For any x, the expression x & -x yields all zeros
except for a 1 in the position of the first set bit from the right.

For example, if x = ...000010011001000
               -x = ...111101100111000
        => x & -x = ...000000000001000
                                  ^
                                  `-- Voila!

I don't know how well known this is. There are even occasions where it is
actually quite useful!

==========================================================================
Tony Lezard.  E-mail: tony@mantis.UUCP, Snail-mail: Mantis Consultants,
Unit 56, St. John's Innovation Centre, Cambridge, CB4 4WS, United Kingdom.
Most appropriate anagram of name: Lazy Rodent.