[comp.sources.wanted] SOUNDEX

gordon@prls.UUCP (Gordon Vickers) (12/12/89)

  TO ing@hades.OZ

    Tried to email you but it bounced.  Please send me your email address
 relitive to a well known hosts, or a surface mail address.  I believe I 
 have just what you want. File size (shar format) is 8.2 Kbytes.

Gordon Vickers 408/991-5370 (Sunnyvale,Ca); {mips|pyramid|philabs}!prls!gordon
------------------------------------------------------------------------------
Earth is a complex array of symbiotic relationships:
Every extinction, whether animal, mineral, or vegetable, hastens our own demise.

ing@hades.OZ (Ian Gold) (12/12/89)

I am looking for a 'soundex' routine in C (or C++).  That is a routine 
capable of finding a substring of a target string that sounds like a given 
string.  The call would look something like this.

		char *soundex(char *target, char *given);


P.S. If the routine you have is NOT in C that's fine.  I can always convert it.

wew@naucse.UUCP (Bill Wilson) (12/13/89)

From article <488@hades.OZ>, by ing@hades.OZ (Ian Gold):
> 
> I am looking for a 'soundex' routine in C (or C++).  That is a routine 
> capable of finding a substring of a target string that sounds like a given 
> string.  The call would look something like this.
> 
> 		char *soundex(char *target, char *given);
>
I would be interested in the code as well...
 
-- 
Let sleeping dragons lie........               | The Bit Chaser
----------------------------------------------------------------
Bill Wilson             (Bitnet: ucc2wew@nauvm | wilson@nauvax)
Northern AZ Univ  Flagstaff, AZ 86011

john@riddle.UUCP (Jonathan Leffler) (12/16/89)

In article <1842@naucse.UUCP> wew@naucse.UUCP (Bill Wilson) writes:
>From article <488@hades.OZ>, by ing@hades.OZ (Ian Gold):
>> I am looking for a 'soundex' routine in C (or C++).
>> 
>> 		char *soundex(char *target, char *given);
>>
>I would be interested in the code as well...

Will this do?

:	"@(#)shar2.c	1.5"
#!/bin/sh
# shar:	Shell Archiver (v1.22)
#
#	This is a shell archive.
#	Remove everything above this line and run sh on the resulting file
#	If this archive is complete, you will see this message at the end
#	"All files extracted"
#
#	Created: Fri Dec 15 21:33:37 1989 by john at Sphinx Ltd.
#	Files archived in this archive:
#	  soundex.c
#
if test -f soundex.c; then echo "File soundex.c exists"; else
echo "x - soundex.c"
sed 's/^X//' << 'SHAR_EOF' > soundex.c &&
X/*
X**	SOUNDEX CODING
X**
X**	Rules:
X**	1.	Retain the first letter; ignore non-alphabetic characters.
X**	2.	Replace second and subsequent characters by a group code.
X**		Group	Letters
X**		1		BFPV
X**		2		CGJKSXZ
X**		3		DT
X**		4		L
X**		5		MN
X**		6		R
X**	3.	Do not repeat digits
X**	4.	Truncate or ser-pad to 4-character result.
X**
X**	Originally formatted with tabstops set at 4 spaces -- you were warned!
X**
X**	Code by: Jonathan Leffler (john@sphinx.co.uk)
X**	This code is shareware -- I wrote it; you can have it for free
X**	if you supply it to anyone else who wants it for free.
X**
X**	BUGS: Assumes ASCII
X*/
X
X#include <ctype.h>
Xstatic char	lookup[] = {
X	'0',	/* A */
X	'1',	/* B */
X	'2',	/* C */
X	'3',	/* D */
X	'0',	/* E */
X	'1',	/* F */
X	'2',	/* G */
X	'0',	/* H */
X	'0',	/* I */
X	'2',	/* J */
X	'2',	/* K */
X	'4',	/* L */
X	'5',	/* M */
X	'5',	/* N */
X	'0',	/* O */
X	'1',	/* P */
X	'0',	/* Q */
X	'6',	/* R */
X	'2',	/* S */
X	'3',	/* T */
X	'0',	/* U */
X	'1',	/* V */
X	'0',	/* W */
X	'2',	/* X */
X	'0',	/* Y */
X	'2',	/* Z */
X};
X
X/*
X**	Soundex for arbitrary number of characters of information
X*/
Xchar	*nsoundex(str, n)
Xchar	*str;	/* In: String to be converted */
Xint		 n;		/* In: Number of characters in result string */
X{
X	static	char	buff[10];
X	register char	*s;
X	register char	*t;
X	char	c;
X	char	l;
X
X	if (n <= 0)
X		n = 4;	/* Default */
X	if (n > sizeof(buff) - 1)
X		n = sizeof(buff) - 1;
X	t = &buff[0];
X
X	for (s = str; ((c = *s) != '\0') && t < &buff[n]; s++)
X	{
X		if (!isascii(c))
X			continue;
X		if (!isalpha(c))
X			continue;
X		c = toupper(c);
X		if (t == &buff[0])
X		{
X			l = *t++ = c;
X			continue;
X		}
X		c = lookup[c-'A'];
X		if (c != '0' && c != l)
X			l = *t++ = c;
X	}
X	while (t < &buff[n])
X		*t++ = '0';
X	*t = '\0';
X	return(&buff[0]);
X}
X
X/* Normal external interface */
Xchar	*soundex(str)
Xchar	*str;
X{
X	return(nsoundex(str, 4));
X}
X
X/*
X**	Alternative interface:
X**	void	soundex(given, gets)
X**	char	*given;
X**	char	*gets;
X**	{
X**		strcpy(gets, nsoundex(given, 4));
X**	}
X*/
X
X
X#ifdef TEST
X#include <stdio.h>
Xmain()
X{
X	char	buff[30];
X
X	while (fgets(buff, sizeof(buff), stdin) != (char *)0)
X		printf("Given: %s Soundex produces %s\n", buff, soundex(buff));
X}
X#endif
SHAR_EOF
chmod 0640 soundex.c || echo "$0: failed to restore soundex.c"
fi
echo All files extracted
exit 0

exnirad@brolga.cc.uq.oz.au (Nirad Sharma) (08/31/90)

I have been using the SOUNDEX function supplied with Oracle V5 and have
found it to be very convenient except that it may be too ambiguous. I
noticed that SOUNDEX only returns a 5 (or 4 - I forget) character string.
Is it possible that other soundex algorithms allow less ambiguity by making
use of more characters ? If so and if the source (pref. c) exists could
someone tell me how to get it, please ?

While I'm at it, are there any FTP sites holding various Oracle bits e.g.
forms, scripts and the like. ?

Thanks for any help

Nirad Sharma  (exnirad@brolga.cc.uq.oz.au)
Continuing Education Unit
The University of Queensland
AUSTRALIA

buckland@cheddar.ucs.ubc.ca (Tony Buckland) (08/31/90)

In article <1990Aug31.020725.6451@brolga.cc.uq.oz.au> exnirad@brolga.cc.uq.oz.au (Nirad Sharma) writes:
>I have been using the SOUNDEX function supplied with Oracle V5 and have
>found it to be very convenient except that it may be too ambiguous. I
>noticed that SOUNDEX only returns a 5 (or 4 - I forget) character string.
>Is it possible that other soundex algorithms allow less ambiguity by making
>use of more characters ?

 This goes *way* back, about 30 years to when I did payroll work,
 but if I recall correctly, the Soundex algorithm (pre-computer,
 of course) we used then always produced the same small number of
 characters so that codes could be compared.  Varying length would
 defeat this purpose, and a longer fixed length would require
 progressively more useless padding of short codes for progressively
 more names as the fixed length increased.