[net.lang.c] soundex algorithm wanted

chris@umcp-cs.UUCP (Chris Torek) (09/04/86)

In article <1239@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes:
>	register char	c, lc, prev = '0';

All the compilers I have used ignore the `register' on `register
char' declarations.  In any case, an `int' will hold everything
that will fit in a `char', and is (usually) the `natural word size'
of the machine.  Is there ever any reason to declare a variable
`register char' rather than `register int'?  Are there any extant
compilers for which the latter will generate *worse* code?
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu

guy@sun.uucp (Guy Harris) (09/04/86)

> All the compilers I have used ignore the `register' on `register
> char' declarations.

This was a change made to the System III compiler for 4(3?)BSD.  The claim
was that the compiler produced "poor, and sometimes incorrect, code" for
register variables less than 32 bits long.  The System V compiler, like the
System III compiler, puts "char" and "short" data into registers.  I have no
idea whether the S5 compiler produces the "poor, and sometimes incorrect,
code" mentioned or not, as I don't know what the code in question was.

> Are there any extant compilers for which the latter will generate *worse*
> code?

Script started on Thu Sep  4 13:41:45 1986
gorodish$ cat foo.c
foo()
{
	register char c;
	register int i;

	if (i == 'c')
		bar();
	if (c == 'c')
		bar();
}
gorodish$ cc -S -O foo.c
gorodish$ cat foo.s
	...(boring preamble deleted)...
	moveq	#99,d1
	cmpl	d1,d6
	jne	L14
	jbsr	_bar
L14:
	cmpb	#99,d7
	jne	LE12
	jbsr	_bar
LE12:
	moveml	a6@(-8),#192
	unlk	a6
	rts
gorodish$ 

Script done on Thu Sep  4 13:42:01 1986

As the M68000 Programmer's Reference Manual says for CMPI, "The size of the
immediate data matches the operation size", so it uses a scratch register
and a "moveq"/"cmp.l" pair rather than a "cmpi.l".  Even given that, it'd
rather compare 8 bits of immediate data against 8 bits in a register than
compare 32 bits of data against 32 bits in a register, especially on the
M68XXX for XXX < 020.
-- 
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com (or guy@sun.arpa)

karl@haddock (09/09/86)

umcp-cs!chris (Chris Torek) writes:
>In article <1239@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes:
>>       register char   c, lc, prev = '0';
>All the compilers I have used ignore the `register' on `register char'
>declarations.

Well, the compilers I've used will put it in a register, but do extra work
(usually unnecessary) to clear the higher bits.  This is unfortunate.

>Is there ever any reason to declare a variable `register char' rather than
>`register int'?

If the range of the variable really is char (so EOF is excluded), then it
logically should be declared char.  I find it distasteful to write something
I don't mean just because it's more efficient than what I'd like*.  If the
machine has some one-byte registers available, `register char' is clearly a
good idea (assuming the compiler has any brains).  If the machine has only
full-word registers but allows byte access to them, `register char c0, c1,
c2, c3' could be packed into a single register; even on a VAX there are some
situations where the packing/unpacking cost is negligible (for non-numerical
usage, e.g.), and it saves on registers.  I don't know of any machines that
do this.

Karl W. Z. Heuer (ima!haddock!karl; karl@haddock.isc.com), The Walking Lint
*Like the keyword `register' itself, which is a crutch for dumb compilers.

mike@whuxl.UUCP (BALDWIN) (09/11/86)

> In article <1239@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes:
> >	register char	c, lc, prev = '0';
> 
> All the compilers I have used ignore the `register' on `register
> char' declarations.  In any case, an `int' will hold everything
> that will fit in a `char', and is (usually) the `natural word size'
> of the machine.  Is there ever any reason to declare a variable
> `register char' rather than `register int'?  Are there any extant
> compilers for which the latter will generate *worse* code?

On a 3B20, `register char c' is indeed put in a register, and
generates exactly the same code as `register int c', except
the instruction is `cmpb' instead of `cmpw'.  Of course there's
a reason for declaring a variable as char instead of int; if it
is used as a char, it should be declared char.
-- 
						Michael Baldwin
			(not the opinions of)	AT&T Bell Laboratories
						{at&t}!whuxl!mike

chris@umcp-cs.UUCP (Chris Torek) (09/14/86)

>In article <3266@umcp-cs.UUCP> I wrote:
>>All the compilers I have used ignore the `register' on `register
>>char' declarations. ... Is there ever any reason to declare a variable
>>`register char' rather than `register int'?  Are there any extant
>>compilers for which the latter will generate *worse* code?

In article <1244@whuxl.UUCP> mike@whuxl.UUCP (BALDWIN) writes:
>On a 3B20, `register char c' is indeed put in a register ....
>Of course there's a reason for declaring a variable as char
>instead of int; if it is used as a char, it should be declared char.

I agree, in principle; this is just a part of saying what you mean.
I meant to ask `is there ever any reason, other than saying what
you mean'.  I have, since I discovered that the 4BSD compiler
ignores `register' on `char's, given in to expediency.  I suppose
I was really looking for an excuse to work on the compiler.  Well,
now I have one:  Sun's compiler does indeed generate better code,
in some cases, for `register char's than for `register int's.
-- 
In-Real-Life: Chris Torek, Univ of MD Comp Sci Dept (+1 301 454 1516)
UUCP:	seismo!umcp-cs!chris
CSNet:	chris@umcp-cs		ARPA:	chris@mimsy.umd.edu