HIS@NIHCU.BITNET (Howard Schreier) (02/05/90)
> Edwin Hart brought up the issue of alphabetizing names. As someone who has > had uncountable problems (personally) due to this problem, let me expound on > just one way collating sequences can cause problems with "unusual" names. > > My surname is "St Sauver" without a period after the "St". No big deal, right? > Maybe not as common as "McDonald" but still, "St" names aren't that uncommon > (Jill St. John, etc). > > Nonetheless, computer systems have forced me to virtually carry a card to hand > to befuddled clerks unable to find my computerized records at the airport, > doctor's office, or wherever: > > "NOTE: My last name is 'St Sauver' but your > computer system may have me 'filed' as: > > CORRECT FILING [unlikely]: > St Sauver > > "ALMOST" CORRECT [typically due to human "assistance"]: > St. Sauver > Saint Sauver > > RUN TOGETHER [Often done to satisfy data validation > StSauver routines which refuse to allow spaces]: > SaintSauver > Saint_Sauver > Saint-Sauver > St.Sauver > St-Sauver > St_Sauver > > COMPUTER MUTILATED [Often done by computers running > Sauver (only) "personalized" letter generation programs > St (only) drawing data from databases which don't > St. (only) reject spaces during initial data validation, > Saint (only)" but which then parse name "elements" on the > basis of those uncontrolled spaces.] > > Since some of you are directly responsible for writing data entry and/or > name parsing routines in SAS for major institutions, I beg of you: be > sensitive to those of us who have non-cannonical surnames (unlike my wife's > maiden name: Hurley). You may have the people with "Mc" and "Mac" names more o > less down pat, but I can tell you, much code still have a LONG ways to go on > at least some of the other "odd" surnames. > > Pardon the tirade, > > Joe St Sauver (JOE@OREGON or JOE@OREGON.UOREGON.EDU) > Statistical Programmer and Consultant > University of Oregon Computing Center I'm one of the guilty, having recently shortened the surname of a Ms. De La Guardia to a simple "De". It's a pretty tricky area, though. The file I'm using has some suffixes like "Jr." following surnames, which I *do* want to strip for some purposes. In other words, I want to leave "De La Guardia" or "St Sauver" alone, but truncate "Wilson III" to "Wilson". It seems to me that the only solution is to enumerate the trailing substrings to be stripped (probably in a format); but there are people with "Junior" and the like as family name. To be a little picky, the problem we are discussing here really is unaffected by the peculiarities of collating sequences. Even if Joe's name is properly recorded in a file, it will be alphabetized before the Stanleys. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \ Howard Schreier, U.S. Dept. of Commerce, Washington / | (Using Version 5 under IBM OS MVS/XA) | / BITNET: HIS@NIHCU INTERNET: HIS@CU.NIH.GOV \ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/