[comp.lang.c] need EBCDIC to ASCII function

lalonde@einstein.misemi ( iccad) (10/04/89)

Sorry; this isn't too deep.

I need a C function that converts a given EBCDIC character
to it's ASCII equiv. If you have such a routine would
you please share it with me.

Thanks,
-- 
======================
Terry Lalonde
Usenet:  ...!uunet!mitel!lalonde
======================

henry@utzoo.uucp (Henry Spencer) (10/05/89)

In article <1060@einstein.misemi> lalonde@.UUCP (Terry Lalonde - iccad) writes:
>I need a C function that converts a given EBCDIC character
>to it's ASCII equiv...

You will have to be more specific.  Which flavor of EBCDIC?  EBCDIC is
not a single well-defined character code, but a family of somewhat-similar
codes.  (Which is why the Unix `dd' command has two different conversions,
plus an entry in the BUGS section discussing this problem.)
-- 
Nature is blind; Man is merely |     Henry Spencer at U of Toronto Zoology
shortsighted (and improving).  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

diamond@csl.sony.co.jp (Norman Diamond) (10/06/89)

In article <1989Oct4.203729.11700@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

>You will have to be more specific.  Which flavor of EBCDIC?  EBCDIC is
>not a single well-defined character code, but a family of somewhat-similar
>codes.  (Which is why the Unix `dd' command has two different conversions,
>plus an entry in the BUGS section discussing this problem.)

For quite a while IBM pretended that ASCII was poorly defined too.
IBM played games with the parity bit in an effort to lock their
products out of a standard marketplace, though we all know that
they failed in this effort :-)

In fact EBCDIC is just as well-defined as ASCII.  Only some IBM print
trains did not use EBCDIC.  "dd" provides an alternative table so that
certain characters will print properly on those printers, but that
target code is not EBCDIC.  Also IBM terminals usually did not use
EBCDIC, so the operating system had to translate to and from the
device codes.

-- 
Norman Diamond, Sony Corp. (diamond%ws.sony.junet@uunet.uu.net seems to work)
  The above opinions are inherited by your machine's init process (pid 1),
  after being disowned and orphaned.  However, if you see this at Waterloo or
  Anterior, then their administrators must have approved of these opinions.

alanm@cognos.UUCP (Alan Myrvold) (10/07/89)

In article <1060@einstein.misemi> lalonde@.UUCP (Terry Lalonde - iccad) writes:
>I need a C function that converts a given EBCDIC character
>to it's ASCII equiv...

Based on the Unix 'dd conv=ascii' conversions :

int ebcdic2ascii[256] = {
      0,   1,   2,   3, 156,   9, 134, 127, 151, 141, 142,  11,
     12,  13,  14,  15,  16,  17,  18,  19, 157, 133,   8, 135,
     24,  25, 146, 143,  28,  29,  30,  31, 128, 129, 130, 131,
    132,  10,  23,  27, 136, 137, 138, 139, 140,   5,   6,   7,
    144, 145,  22, 147, 148, 149, 150,   4, 152, 153, 154, 155,
     20,  21, 158,  26,  32, 160, 161, 162, 163, 164, 165, 166,
    167, 168,  91,  46,  60,  40,  43,  33,  38, 169, 170, 171,
    172, 173, 174, 175, 176, 177,  93,  36,  42,  41,  59,  94,
     45,  47, 178, 179, 180, 181, 182, 183, 184, 185, 124,  44,
     37,  95,  62,  63, 186, 187, 188, 189, 190, 191, 192, 193,
    194,  96,  58,  35,  64,  39,  61,  34, 195,  97,  98,  99,
    100, 101, 102, 103, 104, 105, 196, 197, 198, 199, 200, 201,
    202, 106, 107, 108, 109, 110, 111, 112, 113, 114, 203, 204,
    205, 206, 207, 208, 209, 126, 115, 116, 117, 118, 119, 120,
    121, 122, 210, 211, 212, 213, 214, 215, 216, 217, 218, 219,
    220, 221, 222, 223, 224, 225, 226, 227, 228, 229, 230, 231,
    123,  65,  66,  67,  68,  69,  70,  71,  72,  73, 232, 233,
    234, 235, 236, 237, 125,  74,  75,  76,  77,  78,  79,  80,
     81,  82, 238, 239, 240, 241, 242, 243,  92, 159,  83,  84,
     85,  86,  87,  88,  89,  90, 244, 245, 246, 247, 248, 249,
     48,  49,  50,  51,  52,  53,  54,  55,  56,  57, 250, 251,
    252, 253, 254, 255 };


                                          - Alan

---
Men aren't pigs ... pigs are smarter
---
Alan Myrvold          3755 Riverside Dr.     uunet!mitel!sce!cognos!alanm
Cognos Incorporated   P.O. Box 9707          alanm@cognos.uucp
(613) 738-1440 x5530  Ottawa, Ontario       
                      CANADA  K1G 3Z4       

rcd@ico.ISC.COM (Dick Dunn) (10/13/89)

diamond@csl.sony.co.jp (Norman Diamond) writes:
> ...henry@utzoo.uucp (Henry Spencer) writes:
> >You will have to be more specific.  Which flavor of EBCDIC?  EBCDIC is
> >not a single well-defined character code, but a family of somewhat-similar
> >codes...
> In fact EBCDIC is just as well-defined as ASCII.  Only some IBM print
> trains did not use EBCDIC...

I think it's not quite this simple.  Haul out your trusty yellow card
(that's the successor to the green card, right?:-) and look at the "Code
Translation Table."  You will see a pair of columns labeled "EBCDIC(1)".
It is this pair of columns (at least) which give rise to Henry's comment
about "somewhat-similar codes" and Norman's comment about print trains.
However, if you read the footnote (1) referenced by the column heading, you
see:  "Two columns of EBCDIC graphics are shown.  The first gives standard
bit pattern assignments.  The second shows the T-11 and TN text printing."

In other words, there are two forms of EBCDIC here (Henry's point), but one
of them is standard (Norman's point).  Ouch!  Dumb!  Keep in mind that this
wasn't the result of some dispute among vendors; IBM didn't get it right
among themselves.

What does this mean to an implementor?  There are some interesting impli-
cations here!  On input, you need to accept whatever you get.  On output,
you want to produce the codes that print right.  If you're parsing input
text (I fell into this while working on a Pascal compiler), you can simply
accept both codes for characters which differ.  (There aren't any conflicts
which matter; there are lots of holes in the codesets.)  But for character
and string constants, you gotta generate the codes you're given, which
means that the code for left bracket may not be equal to the code for left
bracket (if they were from different flavors of EBCDIC)!  Worse, the
character that prints correctly isn't the "standard" one--so you can either
have the program listing (remember those?:-) look right, or get the right
answer!
-- 
Dick Dunn     rcd@ico.isc.com    uucp: {ncar,nbires}!ico!rcd     (303)449-2870
   ...No DOS.  UNIX.

ok@cs.mu.oz.au (Richard O'Keefe) (10/13/89)

In article <10946@riks.csl.sony.co.jp>, diamond@csl.sony.co.jp (Norman Diamond) writes:
> In fact EBCDIC is just as well-defined as ASCII.

This is not strictly true.  A fairer comparison would be with ISO 646
(the international standard of which ASCII is a local variant), but
even that would be stretching things a bit.

One of the IBM manuals (I think it is the one for 3270 controllers)
has charts for all the national variants of EBCDIC.  There is an
"EBCDIC" for Hebrew, for Russian, with kana, several European versions,
it makes for a very thick manual.

We could say that only the version listed on the /370 reference card is
"real" EBCDIC, but anyone interested in writing software for international
use should at least be aware that the others exist.

And then there's DBCS (in PL/I, the GRAPHIC data type) ...

henry@utzoo.uucp (Henry Spencer) (10/13/89)

In article <10946@riks.csl.sony.co.jp> diamond@riks. (Norman Diamond) writes:
>In fact EBCDIC is just as well-defined as ASCII.  Only some IBM print
>trains did not use EBCDIC...

Most IBM devices are documented to use EBCDIC.  Not "a non-standard variant
of EBCDIC", not "a subset of EBCDIC", but "EBCDIC".   The trouble is, all
those devices accept slightly different character sets.  The EBCDIC terminals
don't agree with the EBCDIC printers, the printers don't agree with each
other, and none of them agrees with the so-called "standard".  EBCDIC may be
"just as well-defined as ASCII" in some theoretical sense, but that statement
has no practical relevance, because *nobody* uses that "well-defined" code.
-- 
A bit of tolerance is worth a  |     Henry Spencer at U of Toronto Zoology
megabyte of flaming.           | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

meissner@dg-rtp.dg.com (Michael Meissner) (10/15/89)

In article <10946@riks.csl.sony.co.jp> diamond@csl.sony.co.jp (Norman
Diamond) writes:

> In article <1989Oct4.203729.11700@utzoo.uucp> henry@utzoo.uucp
> (Henry Spencer) writes:
>  
>  >You will have to be more specific.  Which flavor of EBCDIC?  EBCDIC is
>  >not a single well-defined character code, but a family of somewhat-similar
>  >codes.  (Which is why the Unix `dd' command has two different conversions,
>  >plus an entry in the BUGS section discussing this problem.)
  
	...

>  In fact EBCDIC is just as well-defined as ASCII.  Only some IBM print
>  trains did not use EBCDIC.  "dd" provides an alternative table so that
>  certain characters will print properly on those printers, but that
>  target code is not EBCDIC.  Also IBM terminals usually did not use
>  EBCDIC, so the operating system had to translate to and from the
>  device codes.

I dunno, the IBM, NCR, SAS, SDRC, and Unisys representives on the ANSI
C committee seemed to bring up the point often enough that there were
regional variations in EBCDIC, just like there are variations in ISO
646 (of which, ASCII is the USA varient).  If anybody would know, it
would be three vendors of computers that use EBCDIC, as well as one
user.

--

Michael Meissner, Data General.				If compiles where much
Uucp:		...!mcnc!rti!xyzzy!meissner		faster, when would we
Internet:	meissner@dg-rtp.DG.COM			have time for netnews?

shore@mtxinu.COM (Melinda Shore) (10/16/89)

In article <10946@riks.csl.sony.co.jp> diamond@csl.sony.co.jp (Norman
Diamond) writes:

>  In fact EBCDIC is just as well-defined as ASCII.  Only some IBM print
>  trains did not use EBCDIC.  "dd" provides an alternative table so that
>  certain characters will print properly on those printers, but that
>  target code is not EBCDIC.  Also IBM terminals usually did not use
>  EBCDIC, so the operating system had to translate to and from the
>  device codes.

Not entirely true.  To the extent that EBCDIC is defined it is multiply
defined;  my handy-dandy 370 architecture reference card from IBM has
two EBCDIC translation charts.  Also, the protocol converters used to
allow the use of ASCII RS-232 terminals with IBM-ish mainframes typically
support several character translation tables, selectable by the user or
system programmer (ours were selected by typing ^a then some single-
digit integer from the user keyboard - imagine the problems THAT caused).

Also, I wouldn't say that IBM terminals "usually" did not use EBCDIC.
Perhaps you mean that most academic sites choose to use ASCII terminals
through protocol converters?  That's certainly not the way of the rest
of the world.
-- 
Melinda Shore                                     shore@mtxinu.com
Mt Xinu                                  ..!uunet!mtxinu.com!shore

mustard@sdrc.UUCP (Sandy Mustard) (10/17/89)

In article <16204@vail.ICO.ISC.COM>, rcd@ico.ISC.COM (Dick Dunn) writes:
> diamond@csl.sony.co.jp (Norman Diamond) writes:
> I think it's not quite this simple.  Haul out your trusty yellow card
> (that's the successor to the green card, right?:-) 

Actually, it's a yellow booklet for System 370 and a pinkish booklet for
System 370 Extended Architecture. :-)

Sandy Mustard

news@laas.laas.fr (USENET News System) (10/20/89)

In article <7214@cognos.UUCP> alanm@cognos.UUCP (Alan Myrvold) writes:
|  In article <1060@einstein.misemi> lalonde@.UUCP (Terry Lalonde - iccad) writes:
|  >I need a C function that converts a given EBCDIC character
|  >to it's ASCII equiv...
|  
|  Based on the Unix 'dd conv=ascii' conversions :
|  
|  [...]
|  Men aren't pigs ... pigs are smarter

How true, but aren't there still TWO official translations between
ASCII and EBCDIC?  I believe that's one reason why uuencoded stuff
sometimes breaks going through BITNET.  Or is the one cited above the
one-and-only? 

Cheers,

Ralph P. Sobek			  Disclaimer: The above ruminations are my own.
ralph@laas.laas.fr			   Addresses are ordered by importance.
ralph@laas.uucp, or ...!uunet!mcvax!laas!ralph		If all else fails, try:
SOBEK@FRMOP11.BITNET				      sobek@eclair.Berkeley.EDU
===============================================================================
Upon the instruments of death the sunlight brightly gleams.   --   King Crimson

scjones@sdrc.UUCP (Larry Jones) (10/22/89)

In article <455@laas.laas.fr>, news@laas.laas.fr (USENET News System) writes:
> How true, but aren't there still TWO official translations between
> ASCII and EBCDIC?  I believe that's one reason why uuencoded stuff
> sometimes breaks going through BITNET.  Or is the one cited above the
> one-and-only? 

Well, it depends on what you mean by "official".  The "official"
translation comes from an ANSI standard (X3.26 if memory serves)
for (of all things!) punched cards.  Since everyone agrees on how
to convert from holerith to ASCII and EBCDIC, transitivity gives
the "official" ASCII to EBCDIC translation.  This is also the
translation that is used when reading ANSI standard tapes on and
IBM mainframe (unless the local system programmers have been
mucking about).

Unfortunately, ASCII and EBCDIC don't always agree on the graphic
characters that are represented by the card punches.  As I recall,
the troublesome translations are:

		ASCII			EBCDIC
	---------------------	--------------------
	left brace		left brace
	right brace		right brace
	left bracket		cent sign
	right bracket		exclaimation point
	exclaimation point	solid vertical bar
	vertical bar		split vertical bar
	caret			logical not sign

Further complications are provided by the various printers and
terminals which have already been mentioned as printing various
apparently random bit patterns as desirable characters.  In an
attempt to rationalize the translation, people with good
intentions have developed any number of alternative translations.
The most common change is to map the ASCII exclaimation point and
vertical bar to the EBCDIC exclaimation point and solid vertical
bar (particularly if you're thinking about transferring text
files or program source that uses those characters) and/or
mapping the braces and brackets to the bit patterns which are
commonly used by printers.  Although it is an admirable goal to
get your text to print out correctly (or you program source to
compile!), it quickly leads to the current situation where there
are nearly an infinite number of conversions in common use.
----
Larry Jones                         UUCP: uunet!sdrc!scjones
SDRC                                      scjones@SDRC.UU.NET
2000 Eastman Dr.                    BIX:  ltl
Milford, OH  45150-2789             AT&T: (513) 576-2070
"I have plenty of good sense.  I just choose to ignore it."
-Calvin

alanm@cognos.UUCP (Alan Myrvold) (10/23/89)

In article <455@laas.laas.fr> ralph@laas.laas.fr (Ralph P. Sobek) writes:
>In article <7214@cognos.UUCP> I writes (sic):
>>  In article <1060@einstein.misemi> lalonde@.UUCP (Terry Lalonde) writes:
>>>I need a C function that converts a given EBCDIC character
>>  Based on the Unix 'dd conv=ascii' conversions :
>>  Men aren't pigs ... pigs are smarter
 
>How true, but aren't there still TWO official translations between
>ASCII and EBCDIC?

Even the "official" translations sometimes break ... if the data started
off as ASCII, and is ending up that way, with only a brief soujourn as
EBCDIC, the only important thing is that the ASCII->EBCDIC translation
be the inverse of the EBCDIC->ASCII.

If the data is ending on a differently coded system from the one it 
started on, the print-train or compiler on the ending system must
be accomodated.

The Unix 'dd' command (here) has 2 different ASCII->EBCDIC conversions, 
but only one EBCDIC->ASCII. The translation that I supplied was the 
EBCDIC->ASCII one. Its pretty good as a starting point, and I imagine
Terry Lalonde, should she/he choose to use it, will be bright enough to
make minor modifications to fit his/her needs.

>>  Men aren't pigs ... pigs are smarter

Why is male bashing socially acceptable these days? I thought feminism
was all about equality! (Follow-ups on this point NOT to comp.lang.c!).

--
Chateau des Charmes Aligote is too delicate a wine for Libby's Zoodles
(Follow-ups on this point NOT to comp.lang.c either!).
--
Alan Myrvold          3755 Riverside Dr.     uunet!mitel!sce!cognos!alanm
Cognos Incorporated   P.O. Box 9707          alanm@cognos.uucp
(613) 738-1440 x5530  Ottawa, Ontario       
                      CANADA  K1G 3Z4