[bit.listserv.sas-l] Parsing Surnames

HIS@NIHCU.BITNET (Howard Schreier) (02/05/90)

> Edwin Hart brought up the issue of alphabetizing names. As someone who has
> had uncountable problems (personally) due to this problem, let me expound on
> just one way collating sequences can cause problems with "unusual" names.
>
> My surname is "St Sauver" without a period after the "St". No big deal, right?
> Maybe not as common as "McDonald" but still, "St" names aren't that uncommon
> (Jill St. John, etc).
>
> Nonetheless, computer systems have forced me to virtually carry a card to hand
> to befuddled clerks unable to find my computerized records at the airport,
> doctor's office, or wherever:
>
>    "NOTE: My last name is 'St Sauver' but your
>    computer system may have me 'filed' as:
>
>    CORRECT FILING [unlikely]:
>    St Sauver
>
>    "ALMOST" CORRECT [typically due to human "assistance"]:
>    St. Sauver
>    Saint Sauver
>
>    RUN TOGETHER    [Often done to satisfy data validation
>    StSauver         routines which refuse to allow spaces]:
>    SaintSauver
>    Saint_Sauver
>    Saint-Sauver
>    St.Sauver
>    St-Sauver
>    St_Sauver
>
>    COMPUTER MUTILATED  [Often done by computers running
>    Sauver (only)       "personalized" letter generation programs
>    St (only)           drawing data from databases which don't
>    St. (only)          reject spaces during initial data validation,
>    Saint (only)"       but which then parse name "elements" on the
>                        basis of those uncontrolled spaces.]
>
> Since some of you are directly responsible for writing data entry and/or
> name parsing routines in SAS for major institutions, I beg of you: be
> sensitive to those of us who have non-cannonical surnames (unlike my wife's
> maiden name: Hurley). You may have the people with "Mc" and "Mac" names more o
> less down pat, but I can tell you, much code still have a LONG ways to go on
> at least some of the other "odd" surnames.
>
> Pardon the tirade,
>
> Joe St Sauver (JOE@OREGON or JOE@OREGON.UOREGON.EDU)
> Statistical Programmer and Consultant
> University of Oregon Computing Center

I'm one of the guilty, having recently shortened the surname
of  a  Ms.  De  La  Guardia to a simple "De".  It's a pretty
tricky area, though.  The file I'm using has  some  suffixes
like  "Jr."   following surnames, which I *do* want to strip
for some purposes.  In other words, I want to leave  "De  La
Guardia"  or "St Sauver" alone, but truncate "Wilson III" to
"Wilson".  It seems to me  that  the  only  solution  is  to
enumerate  the  trailing substrings to be stripped (probably
in a format); but there are people  with  "Junior"  and  the
like as family name.

To be a little picky, the problem  we  are  discussing  here
really  is  unaffected  by  the  peculiarities  of collating
sequences.  Even if Joe's name is  properly  recorded  in  a
file, it will be alphabetized before the Stanleys.

/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\
\   Howard Schreier, U.S. Dept. of Commerce, Washington    /
|          (Using Version 5 under IBM OS MVS/XA)           |
/   BITNET: HIS@NIHCU          INTERNET: HIS@CU.NIH.GOV    \
\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/