UEPRCK@UNC.BITNET (Bob Kleckner) (02/01/90)
This is an expansion on Sally Muller's 01/30/90 comments on PROC SORT
and differences between EBCDIC (IBM) and ASCII (everybody else ?)
internal collating sequences for 'character variable' comparisons.
For those unable to RFTM (p. 1040, V.5 Basics) the smallest to largest
comparison sequence for
EBCDIC is a to z < A to Z < 0 to 9 and for
ASCII is 0 to 9 < A to Z < a to z .
As pointed out by Sally, this difference will affect the sort order of
character value BY variables in PROC SORT.
This difference will also affect character comparisons (p. 224, V.5
Basics) as follows:
In EBCDIC
NAME= 'a';
If NAME < '1'; /* Is true */
In ASCII
NAME= 'a';
If NAME < '1'; /* Is false */ .
Warning: Keep this difference in mind if you are moving between
operating systems.
: This may be especially important for those testing SAS
programs on a PC and then doing production runs on an
IBM MVS or CMS machine.
Because someone will ask: Your IBM PC and PS2 machines use the
ASCII internal collating sequence.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Bob Kleckner (UEPRCK@UNC.BITNET)
Applications Analyst Programmer
Dept. of Epidemiology, UNC
Chapel Hill, NC 27599-7400
919-966-2080UPHILG@UNC.BITNET (Philip Gallagher) (02/02/90)
Bob Kleckner <UEPRCK@UNC> pointed out very nicely the catastrophes that
may occur when running a SAS program developed on a machine that uses
the ASCII collating sequence on a machine that uses EBCDIC (or,
vice-versa). He gives the example
"... the smallest to largest comparison sequence for
EBCDIC is a to z < A to Z < 0 to 9 and for
ASCII is 0 to 9 < A to Z < a to z . "
Since one of my students correctly pointed out my ignorance last
semester, I would like to tell you that the EBCDIC collating sequence
contains what I choose to consider an oddity that makes me want to say
"That EBCDIC collating sequence is even weirder than I realized!". I
refer to my version 5 Basics manual, p. 1040:
Under CMS, OS, & VSE a portion of the EBCDIC collating sequence is:
abcdefghijklmnopq~stuvwxyz{ABCDEFGHI}J
KLMNOPQR\STUVWXYZ0123456789
What idiot would have been naive enough to tell a student (without
looking it up) that a tilda (~) would not appear in the middle of the
small letter sequence and that a right brace (}) and a backslash (\)
would not appear in the middle of the capital letter sequence?
Unfortunately, I know such an idiot; he was very embarassed when proven
to be wrong. "I can't believe that ... ." I suppose I should have
realized it; I've used the IBM card/folder with the EBCDIC and ASCII
codes on it enough to know about those strange patterns. Anyway, I
trust you won't get fooled the way my idiot friend did.
Phil GallagherHIS@NIHCU.BITNET (Howard Schreier) (02/02/90)
> From: Philip Gallagher <UPHILG@UNC.BITNET> > > > > "... the smallest to largest comparison sequence for > > EBCDIC is a to z < A to Z < 0 to 9 and for > > ASCII is 0 to 9 < A to Z < a to z . " > > Since one of my students correctly pointed out my ignorance last > semester, I would like to tell you that the EBCDIC collating sequence > contains what I choose to consider an oddity that makes me want to say > "That EBCDIC collating sequence is even weirder than I realized!". I > refer to my version 5 Basics manual, p. 1040: > Under CMS, OS, & VSE a portion of the EBCDIC collating sequence is: > abcdefghijklmnopq~stuvwxyz{ABCDEFGHI}J > KLMNOPQR\STUVWXYZ0123456789 Note: the EBCDIC sequence *does* include the lower case "r", following the "q" and preceding the tilde. I see the following implications: Where sorts are done strictly for internal purposes (such as MERGEing data sets), there shouldn't be much of a problem. A data set which has been transported, say from an EBCDIC environment to an ASCII one, may have to be re-sorted. If a sort is done to alphabetize a list for external presentation, and the character variables contain a mixture of upper and lower case, it is a good idea to create one or more new variables using the UPCASE function and to actually sort by these. This is true for both ASCII and EBCDIC environments. It assures correct placement of names with embedded upper case letters (such as VanDyke). If you need an alphabet string in a character expression (for example, to use with the VERIFY function), do not try to generate it with the COLLATE function. It would work with ASCII, but not with EBCDIC. /\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\ \ Howard Schreier, U.S. Dept. of Commerce, Washington / | (Using Version 5 under IBM OS MVS/XA) | / BITNET: HIS@NIHCU INTERNET: HIS@CU.NIH.GOV \ \/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/\/