[comp.databases] Duplicate elimination

mdf@tut.cis.ohio-state.edu (Mark D. Freeman) (04/14/88)

I am looking for some algorithms to do duplicate detection on
addresses.  We have several databases which all have as a subset:
	
	First name
	Last Name
	Address1
	Address2
	City
	State
	Zip

We would like some way of determining if a new record represents a
duplicate of the address, taking into account variations in the
addressing (i.e. 201 Test Street and 201 Test St., 201-B Foo Ave and
201 Foo Ave. Apt. B, etc.).  

An algorithm to standardize addresses would be great too.  The post
office uses one for their free 9-digit-zip encoding service, but I
don't know how it works.

Thanks!

-- 
Mark D. Freeman						  (614) 262-1418
					      mdf@tut.cis.ohio-state.edu
2440 Medary Avenue	   ...!cbosgd!osu-cis!tut.cis.ohio-state.edu!mdf
Columbus, OH  43202-3014      Guest account at The Ohio State University