UEPRCK@UNC.BITNET (Bob Kleckner) (02/01/90)
This is an expansion on Sally Muller's 01/30/90 comments on PROC SORT and differences between EBCDIC (IBM) and ASCII (everybody else ?) internal collating sequences for 'character variable' comparisons. For those unable to RFTM (p. 1040, V.5 Basics) the smallest to largest comparison sequence for EBCDIC is a to z < A to Z < 0 to 9 and for ASCII is 0 to 9 < A to Z < a to z . As pointed out by Sally, this difference will affect the sort order of character value BY variables in PROC SORT. This difference will also affect character comparisons (p. 224, V.5 Basics) as follows: In EBCDIC NAME= 'a'; If NAME < '1'; /* Is true */ In ASCII NAME= 'a'; If NAME < '1'; /* Is false */ . Warning: Keep this difference in mind if you are moving between operating systems. : This may be especially important for those testing SAS programs on a PC and then doing production runs on an IBM MVS or CMS machine. Because someone will ask: Your IBM PC and PS2 machines use the ASCII internal collating sequence. +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ Bob Kleckner (UEPRCK@UNC.BITNET) Applications Analyst Programmer Dept. of Epidemiology, UNC Chapel Hill, NC 27599-7400 919-966-2080
UPHILG@UNC.BITNET (Philip Gallagher) (02/02/90)
Bob Kleckner <UEPRCK@UNC> pointed out very nicely the catastrophes that may occur when running a SAS program developed on a machine that uses the ASCII collating sequence on a machine that uses EBCDIC (or, vice-versa). He gives the example "... the smallest to largest comparison sequence for EBCDIC is a to z < A to Z < 0 to 9 and for ASCII is 0 to 9 < A to Z < a to z . " Since one of my students correctly pointed out my ignorance last semester, I would like to tell you that the EBCDIC collating sequence contains what I choose to consider an oddity that makes me want to say "That EBCDIC collating sequence is even weirder than I realized!". I refer to my version 5 Basics manual, p. 1040: Under CMS, OS, & VSE a portion of the EBCDIC collating sequence is: abcdefghijklmnopq~stuvwxyz{ABCDEFGHI}J KLMNOPQR\STUVWXYZ0123456789 What idiot would have been naive enough to tell a student (without looking it up) that a tilda (~) would not appear in the middle of the small letter sequence and that a right brace (}) and a backslash (\) would not appear in the middle of the capital letter sequence? Unfortunately, I know such an idiot; he was very embarassed when proven to be wrong. "I can't believe that ... ." I suppose I should have realized it; I've used the IBM card/folder with the EBCDIC and ASCII codes on it enough to know about those strange patterns. Anyway, I trust you won't get fooled the way my idiot friend did. Phil Gallagher
HIS@NIHCU.BITNET (Howard Schreier) (02/02/90)
> From: Philip Gallagher <UPHILG@UNC.BITNET> > > > > "... the smallest to largest comparison sequence for > > EBCDIC is a to z < A to Z < 0 to 9 and for > > ASCII is 0 to 9 < A to Z < a to z . " > > Since one of my students correctly pointed out my ignorance last > semester, I would like to tell you that the EBCDIC collating sequence > contains what I choose to consider an oddity that makes me want to say > "That EBCDIC collating sequence is even weirder than I realized!". I > refer to my version 5 Basics manual, p. 1040: > Under CMS, OS, & VSE a portion of the EBCDIC collating sequence is: > abcdefghijklmnopq~stuvwxyz{ABCDEFGHI}J > KLMNOPQR\STUVWXYZ0123456789 Note: the EBCDIC sequence *does* include the lower case "r", following the "q" and preceding the tilde. I see the following implications: Where sorts are done strictly for internal purposes (such as MERGEing data sets), there shouldn't be much of a problem. A data set which has been transported, say from an EBCDIC environment to an ASCII one, may have to be re-sorted. If a sort is done to alphabetize a list for external presentation, and the character variables contain a mixture of upper and lower case, it is a good idea to create one or more new variables using the UPCASE function and to actually sort by these. This is true for both ASCII and EBCDIC environments. It assures correct placement of names with embedded upper case letters (such as VanDyke). If you need an alphabet string in a character expression (for example, to use with the VERIFY function), do not try to generate it with the COLLATE function. It would work with ASCII, but not with EBCDIC. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \ Howard Schreier, U.S. Dept. of Commerce, Washington / | (Using Version 5 under IBM OS MVS/XA) | / BITNET: HIS@NIHCU INTERNET: HIS@CU.NIH.GOV \ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/