[comp.lang.c] EBCDIC <--> ASCII conversion

noren@dinl.uucp (Charles Noren) (10/03/90)

We are communicating between Sun 3 (with SunOS 4.0.3) and an
IBM Mainframe (don't know the model, we're not IBM jocks)
via TCP/IP.  Our question, is there any program in Netland
that converts back and forth between EBCDIC and ASCII
(preferrably in C, but we will take any example)?

If you have some knowledge about the conversion process, such
as the bit/byte ordering of an IBM vs. a Sun, any comments
would be very helpful.

Thanks,


-- 
Chuck Noren
NET:     dinl!noren@ncar.ucar.edu
US-MAIL: Martin Marietta I&CS, MS XL8058, P.O. Box 1260,
         Denver, CO 80201-1260
Phone:   (303) 971-7930

jik@athena.mit.edu (Jonathan I. Kamens) (10/03/90)

In article <1756@dinl.mmc.UUCP>, noren@dinl.uucp (Charles Noren) writes:
|> We are communicating between Sun 3 (with SunOS 4.0.3) and an
|> IBM Mainframe (don't know the model, we're not IBM jocks)
|> via TCP/IP.  Our question, is there any program in Netland
|> that converts back and forth between EBCDIC and ASCII
|> (preferrably in C, but we will take any example)?

  The Unix program "dd" does this.  In particular, the "conv=ascii" option
converts EBCDIC to ASCII, and the "conv=ebcdic" option goes the other way.

  See the man page for more information.

-- 
Jonathan Kamens			              USnail:
MIT Project Athena				11 Ashford Terrace
jik@Athena.MIT.EDU				Allston, MA  02134
Office: 617-253-8495			      Home: 617-782-0710

jeffb@blia.BLI.COM (Jeff Beard) (10/03/90)

dd conv=ebcdic is in error for some codes, and is why dd also supplies
   conv=ibm.  However, this too is in error frequently due to
   ambiguities in the EBCDIC table(s) ... it depends on which table you
   read and how one needs to map the ASCII to EBCDIC set difference.

The following two tables will allow you to define your own translation.

/* This routine contains only the two tables needed to convert
   ASCII to EBCDIC  and   EBCDIC to ASCII.
   The conversion is according to BTL character set standards.
 
   There are some anomolies in a one/one mapping:
        not all characters are in both charater sets
           eg: PL/1 not       '~'
               ascii karot    '^'
               tidle          '~'
        not all devices can display all characters in the set
           eg: square braces  '[]'
               brackets       '{}'
 
   we accept ASCII data in a 1/1 mapping
 
   we translate EBCDIC ~ to ASCII ~
 
   Table used to convert ascii to ebcdic. ~
*/
 
static char dummy[] = {0};  /* for EOF index */
char atoe[] = {
   0x00          , 0x01          , 0x02          , 0x03          ,
   0x37          , 0x2d          , 0x2e          , 0x2f          ,
   0x16          , 0x05          , 0x25          , 0x0b          ,
   0x0c          , 0x0d          , 0x0e          , 0x0f          ,
   0x10          , 0x11          , 0x12          , 0x13          ,
   0x3c          , 0x3d          , 0x32          , 0x26          ,
   0x18          , 0x19          , 0x1a          , 0x27          ,
   0x1c          , 0x1d          , 0x1e          , 0x1f          ,
   0x40 /* ' ' */, 0x5a /* '!' */, 0x7f /* '"' */, 0x7b /* '#' */,
   0x5b /* '$' */, 0x6c /* '%' */, 0x50 /* '&' */, 0x7d /* ''' */,
   0x4d /* '(' */, 0x5d /* ')' */, 0x5c /* '*' */, 0x4e /* '+' */,
   0x6b /* ',' */, 0x60 /* '-' */, 0x4b /* '.' */, 0x61 /* '/' */,
   0xf0 /* '0' */, 0xf1 /* '1' */, 0xf2 /* '2' */, 0xf3 /* '3' */,
   0xf4 /* '4' */, 0xf5 /* '5' */, 0xf6 /* '6' */, 0xf7 /* '7' */,
   0xf8 /* '8' */, 0xf9 /* '9' */, 0x7a /* ':' */, 0x5e /* ';' */,
   0x4c /* '<' */, 0x7e /* '=' */, 0x6e /* '>' */, 0x6f /* '?' */,
   0x7c /* ']' */, 0xc1 /* 'A' */, 0xc2 /* 'B' */, 0xc3 /* 'C' */,
   0xc4 /* 'D' */, 0xc5 /* 'E' */, 0xc6 /* 'F' */, 0xc7 /* 'G' */,
   0xc8 /* 'H' */, 0xc9 /* 'I' */, 0xd1 /* 'J' */, 0xd2 /* 'K' */,
   0xd3 /* 'L' */, 0xd4 /* 'M' */, 0xd5 /* 'N' */, 0xd6 /* 'O' */,
   0xd7 /* 'P' */, 0xd8 /* 'Q' */, 0xd9 /* 'R' */, 0xe2 /* 'S' */,
   0xe3 /* 'T' */, 0xe4 /* 'U' */, 0xe5 /* 'V' */, 0xe6 /* 'W' */,
   0xe7 /* 'X' */, 0xe8 /* 'Y' */, 0xe9 /* 'Z' */, 0xad /* '[' */,
   0xe0 /* '\' */, 0xbd /* ']' */, 0x9a /* '^' */, 0x6d /* '_' */,
   0x79 /* '`' */, 0x81 /* 'a' */, 0x82 /* 'b' */, 0x83 /* 'c' */,
   0x84 /* 'd' */, 0x85 /* 'e' */, 0x86 /* 'f' */, 0x87 /* 'g' */,
   0x88 /* 'h' */, 0x89 /* 'i' */, 0x91 /* 'j' */, 0x92 /* 'k' */,
   0x93 /* 'l' */, 0x94 /* 'm' */, 0x95 /* 'n' */, 0x96 /* 'o' */,
   0x97 /* 'p' */, 0x98 /* 'q' */, 0x99 /* 'r' */, 0xa2 /* 's' */,
   0xa3 /* 't' */, 0xa4 /* 'u' */, 0xa5 /* 'v' */, 0xa6 /* 'w' */,
   0xa7 /* 'x' */, 0xa8 /* 'y' */, 0xa9 /* 'z' */, 0xc0 /* '{' */,
   0x4f /* '|' */, 0xd0 /* '}' */, 0xa1 /* '~' */, 0x07          ,
   0x04          , 0x06          , 0x08          , 0x09          ,
   0x0a          , 0x14          , 0x15          , 0x17          ,
   0x1b          , 0x20          , 0x21          , 0x22          ,
   0x23          , 0x24          , 0x28          , 0x29          ,
   0x2a          , 0x2b          , 0x2c          , 0x30          ,
   0x31          , 0x33          , 0x34          , 0x35          ,
   0x36          , 0x38          , 0x39          , 0x3a          ,
   0x3b          , 0x3e          , 0x3f          , 0x41          ,
   0x42          , 0x43          , 0x44          , 0x45          ,
   0x46          , 0x47          , 0x48          , 0x49          ,
   0x4a          , 0x51          , 0x52          , 0x53          ,
   0x54          , 0x55          , 0x56          , 0x57          ,
   0x58          , 0x59          , 0x62          , 0x63          ,
   0x64          , 0x65          , 0x66          , 0x67          ,
   0x68          , 0x69          , 0x6a          , 0x70          ,
   0x71          , 0x72          , 0x73          , 0x74          ,
   0x75          , 0x76          , 0x77          , 0x78          ,
   0x80          , 0x8a          , 0x8b          , 0x8c          ,
   0x8d          , 0x8e          , 0x8f          , 0x90          ,
   0x9b          , 0x9c          , 0x9d          , 0x9e          ,
   0x9f          , 0xa0          , 0xa1          , 0xaa          ,
   0xab          , 0xac          , 0xae          , 0xaf          ,
   0xb0          , 0xb1          , 0xb2          , 0xb3          ,
   0xb4          , 0xb5          , 0xb6          , 0xb7          ,
   0xb8          , 0xb9          , 0xba          , 0xbb          ,
   0xbC          , 0xbe          , 0xbf          , 0xca          ,
   0xcb          , 0xcc          , 0xcd          , 0xce          ,
   0xcf          , 0xda          , 0xdb          , 0xdc          ,
   0xdd          , 0xde          , 0xdf          , 0xe1          ,
   0xea          , 0xeb          , 0xec          , 0xed          ,
   0xee          , 0xef          , 0xfa          , 0xfb          ,
   0xfc          , 0xfd          , 0xfe          , 0xff          ,
   };
 
 
 
   /* Table used to convert ebcdic to ascii.
   */
 
static char dummy2[] = {0};  /* for EOF index */
char etoa[] = {
  /*  0               1               2               3           */
   0000          , 0001          , 0002          , 0003          ,
   0200          , 0011          , 0201          , 0177          ,
   0202          , 0203          , 0204          , 0013          ,
   0014          , 0015          , 0016          , 0017          ,
   0020          , 0021          , 0022          , 0023          ,
   0205          , 0206          , 0010          , 0207          ,
   0030          , 0031          , 0032          , 0210          ,
   0034          , 0035          , 0036          , 0037          ,
   0211          , 0212          , 0213          , 0214          ,
   0215          , 0012          , 0027          , 0033          ,
   0216          , 0217          , 0220          , 0221          ,
   0222          , 0005          , 0006          , 0007          ,
   0223          , 0224          , 0026          , 0225          ,
   0226          , 0227          , 0230          , 0004          ,
   0231          , 0232          , 0233          , 0234          ,
   0024          , 0025          , 0235          , 0236          ,
   0040 /* ' ' */, 0237          , 0240          , 0241          ,
   0242          , 0243          , 0244          , 0245          ,
   0246          , 0247          , 0250          , 0056 /* '.' */,
   0074 /* '<' */, 0050 /* '(' */, 0053 /* '+' */, 0174 /* '|' */,
   0046 /* '&' */, 0251          , 0252          , 0253          ,
   0254          , 0255          , 0256          , 0257          ,
   0260          , 0261          , 0041 /* '!' */, 0044 /* '$' */,
   0052 /* '*' */, 0051 /* ')' */, 0073 /* ';' */, 0176 /* '~' */,
   0055 /* '-' */, 0057 /* '/' */, 0262          , 0263          ,
   0264          , 0265          , 0266          , 0267          ,
   0270          , 0271          , 0272          , 0054 /* ',' */,
   0045 /* '%' */, 0137 /* '_' */, 0076 /* '>' */, 0077 /* '?' */,
   0273          , 0274          , 0275          , 0276          ,
   0277          , 0300          , 0301          , 0302          ,
   0303          , 0140 /* '`' */, 0072 /* ':' */, 0043 /* '#' */,
   0100 /* '@' */, 0047 /* ''' */, 0075 /* '=' */, 0042 /* '"' */,
   0304          , 0141 /* 'a' */, 0142 /* 'b' */, 0143 /* 'c' */,
   0144 /* 'd' */, 0145 /* 'e' */, 0146 /* 'f' */, 0147 /* 'g' */,
   0150 /* 'h' */, 0151 /* 'i' */, 0305          , 0306          ,
   0307          , 0310          , 0311          , 0312          ,
   0313          , 0152 /* 'j' */, 0153 /* 'k' */, 0154 /* 'l' */,
   0155 /* 'm' */, 0156 /* 'n' */, 0157 /* 'o' */, 0160 /* 'p' */,
   0161 /* 'q' */, 0162 /* 'r' */, 0136 /* '^' */, 0314          ,
   0315          , 0316          , 0317          , 0320          ,
#ifdef OLDC
   0321          , 0322 /*  ~  */, 0163 /* 's' */, 0164 /* 't' */,
#else  OLDC
   0321          , 0176 /*  ~  */, 0163 /* 's' */, 0164 /* 't' */,
#endif OLDC
   0165 /* 'u' */, 0166 /* 'v' */, 0167 /* 'w' */, 0170 /* 'x' */,
   0171 /* 'y' */, 0172 /* 'z' */, 0323          , 0324          ,
   0325          , 0133 /* '[' */, 0326          , 0327          ,
   0330          , 0331          , 0332          , 0333          ,
   0334          , 0335          , 0336          , 0337          ,
   0340          , 0341          , 0342          , 0343          ,
   0344          , 0135 /* ']' */, 0345          , 0346          ,
   0173 /* '{' */, 0101 /* 'A' */, 0102 /* 'B' */, 0103 /* 'C' */,
   0104 /* 'D' */, 0105 /* 'E' */, 0106 /* 'F' */, 0107 /* 'G' */,
   0110 /* 'H' */, 0111 /* 'I' */, 0347          , 0350          ,
   0351          , 0352          , 0353          , 0354          ,
   0175 /* '}' */, 0112 /* 'J' */, 0113 /* 'K' */, 0114 /* 'L' */,
   0115 /* 'M' */, 0116 /* 'N' */, 0117 /* 'O' */, 0120 /* 'P' */,
   0121 /* 'Q' */, 0122 /* 'R' */, 0355          , 0356          ,
   0357          , 0360          , 0361          , 0362          ,
   0134 /* '\' */, 0363          , 0123 /* 'S' */, 0124 /* 'T' */,
   0125 /* 'U' */, 0126 /* 'V' */, 0127 /* 'W' */, 0130 /* 'X' */,
   0131 /* 'Y' */, 0132 /* 'Z' */, 0364          , 0365          ,
   0366          , 0367          , 0370          , 0371          ,
   0060 /* '0' */, 0061 /* '1' */, 0062 /* '2' */, 0063 /* '3' */,
   0064 /* '4' */, 0065 /* '5' */, 0066 /* '6' */, 0067 /* '7' */,
   0070 /* '8' */, 0071 /* '9' */, 0372          , 0373          ,
   0374          , 0375          , 0376          , 0377          ,
   };

luke@modus.sublink.ORG (Luciano Mannucci) (10/04/90)

In article <1756@dinl.mmc.UUCP>, noren@dinl.uucp (Charles Noren) writes:
> We are communicating between Sun 3 (with SunOS 4.0.3) and an
> IBM Mainframe (don't know the model, we're not IBM jocks)
> via TCP/IP.  Our question, is there any program in Netland
> that converts back and forth between EBCDIC and ASCII
> (preferrably in C, but we will take any example)?

Apologies for posting C code in the wrong newsgroup.

There are two very simple functions converting ASCII into EBCDIC and
vice-versa having been working for many years in many programs:

--- cut --- cut --- cut --- cut --- cut --- cut --- cut --- cut ---
static char _tob[] = {
0x00,0x01,0x02,0x03,0x37,0x2d,0x2e,0x2f,0x16,0x05,0x25,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x3c,0x3d,0x32,0x26,0x18,0x19,0x3f,0x27,0x22,0x40,0x35,0x40,
0x40,0x5a,0x7f,0x7b,0x5b,0x6c,0x50,0x7d,0x4d,0x5d,0x5c,0x4e,0x6b,0x60,0x4b,0x61,
0xf0,0xf1,0xf2,0xf3,0xf4,0xf5,0xf6,0xf7,0xf8,0xf9,0x7a,0x5e,0x4c,0x7e,0x6e,0x6f,
0x7c,0xc1,0xc2,0xc3,0xc4,0xc5,0xc6,0xc7,0xc8,0xc9,0xd1,0xd2,0xd3,0xd4,0xd5,0xd6,
0xd7,0xd8,0xd9,0xe2,0xe3,0xe4,0xe5,0xe6,0xe7,0xe8,0xe9,0x4f,0xe1,0x5f,0x40,0x6d,
0x40,0x81,0x82,0x83,0x84,0x85,0x86,0x87,0x88,0x89,0x91,0x92,0x93,0x94,0x95,0x96,
0x97,0x98,0x99,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xc0,0x6a,0xd0,0xa1,0x07,
};
static char _toa[] = {
0x00,0x01,0x02,0x03,0x00,0x09,0x00,0x7f,0x00,0x00,0x00,0x0b,0x0c,0x0d,0x0e,0x0f,
0x10,0x11,0x12,0x13,0x00,0x0a,0x08,0x00,0x18,0x19,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x1c,0x00,0x00,0x0a,0x17,0x1b,0x00,0x00,0x00,0x00,0x00,0x05,0x06,0x07,
0x00,0x00,0x16,0x00,0x00,0x1e,0x00,0x04,0x00,0x00,0x00,0x00,0x14,0x15,0x00,0x1a,
0x20,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x2e,0x3c,0x28,0x2b,0x5b,
0x26,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x21,0x24,0x2a,0x29,0x3b,0x5d,
0x2d,0x2f,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x7c,0x2c,0x25,0x5f,0x3e,0x3f,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x3a,0x23,0x40,0x27,0x3d,0x22,
0x00,0x61,0x62,0x63,0x64,0x65,0x66,0x67,0x68,0x69,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x6a,0x6b,0x6c,0x6d,0x6e,0x6f,0x70,0x71,0x72,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x7e,0x73,0x74,0x75,0x76,0x77,0x78,0x79,0x7a,0x00,0x00,0x00,0x00,0x00,0x00,
0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,0x00,
0x7b,0x41,0x42,0x43,0x44,0x45,0x46,0x47,0x48,0x49,0x00,0x00,0x00,0x00,0x00,0x00,
0x7d,0x4a,0x4b,0x4c,0x4d,0x4e,0x4f,0x50,0x51,0x52,0x00,0x00,0x00,0x00,0x00,0x00,
0x5c,0x00,0x53,0x54,0x55,0x56,0x57,0x58,0x59,0x5a,0x00,0x00,0x00,0x00,0x00,0x00,
0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x00,0x00,0x00,0x00,0x00,0x00,
};

char btoa(c)
char c;
{
	if (c < 0) return -1;
	return _toa[c];
}

char atob(c)
char c;
{
	if (c < 0) return -1;
	if (c <= 0x7f)
		return _tob[c];
	else
		return 0;
}
/*
	" @(#)btoa.c	1.1 (lkdb) - 87/02/11 \n";
	*/
--- cut --- cut --- cut --- cut --- cut --- cut --- cut --- cut ---

luke.
-
-- 
  _ _           __             Via Aleardo Aleardi, 12 - 20154 Milano (Italy)
 | | | _  _|   (__             PHONE : +39 2 3315328 FAX: +39 2 3315778
 | | |(_)(_||_|___) Srl        E-MAIL: luke@modus.sublink.ORG
______________________________ Software & Services for Advertising & Marketing

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/09/90)

In article <12609@blia.BLI.COM>, jeffb@blia.BLI.COM (Jeff Beard) writes:
> static char dummy[] = {0};  /* for EOF index */
> char atoe[] = {
>    };

This isn't going to work.  A compiler may insert any amount of padding
after dummy[].  It may even put atoe[] at a lower address than dummy[].
(There is nothing to stop a compiler sorting top-level variables into
alphabetic order...)  'static' and 'extern' variables might well go into
different sections.  And so on.

Even
	struct {
	    char eof_code;
	    char atoe[256]
	} = { 0,
	    /* atoe values as before */
	}
isn't going to work in general because a compiler may insert padding
between fields.

The only method that is going to work is
	char RAWatoe[] =
	    {	0,
		/* atoe values as before */
	    };
	char *atoe = RAWatoe+1;
	/* OR
	#define atoe(x) RAWateo[1+(x)]
	*/

> static char dummy2[] = {0};  /* for EOF index */
> char etoa[] = {
> #ifdef OLDC
>    0321          , 0322 /*  ~  */, 0163 /* 's' */, 0164 /* 't' */,
> #else  OLDC
>    0321          , 0176 /*  ~  */, 0163 /* 's' */, 0164 /* 't' */,
> #endif OLDC
>    };

This one isn't going to work for an additional reason:  the ANSI C
standard doesn't accept tokens after #else or #endif, and the ANSI
standard doesn't accept it because it wasn't universal practice.
For example, the C compiler for UNIX V.3 chokes on them.

The two following 'ed' commands may be useful to people who still
have to fix this in their code.  (My code used to be _full_ of this
stuff, but I don't blame ANSI, it really wasn't portable.)

	1,$ s:^\([ \t]*#[ \t]*else[ \t][ \t]*\)\([^ \t/].*\)$:\1/*\2*/:
	1,$ s:^\([ \t]*#[ \t]*endif[ \t][ \t]*\)\([^ \t/].*\)$:\1/*\2*/:

-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

bakke@plains.NoDak.edu (Jeffrey P. Bakke) (10/11/90)

In article <661@modus.sublink.ORG> luke@modus.sublink.ORG (Luciano Mannucci) writes:
> In article <1756@dinl.mmc.UUCP>, noren@dinl.uucp (Charles Noren) writes:
> > We are communicating between Sun 3 (with SunOS 4.0.3) and an
> > IBM Mainframe (don't know the model, we're not IBM jocks)
> > via TCP/IP.  Our question, is there any program in Netland
> > that converts back and forth between EBCDIC and ASCII
> > (preferrably in C, but we will take any example)?
> 
> Apologies for posting C code in the wrong newsgroup.
That's alright, its always interesting to me.  Anyway, if you're on a Sun
system, an easier way would be to send the file over from the IBM to the
Sun, and then use the dd program.  This program allows file copies with
translation.  There are options to translate from EBCDIC to ASCII and vice
versa.  You'd have to look through the man pages.  But this will probably
do what you have to.  The 'dd' program is part of the standard SunOS 
installation tape I believe.  It should be located in the /usr/bin directory.

No need to write your own conversion program if the utilities already
exist.

-- 
Jeffrey P. Bakke               Internet: bakke@plains.NoDak.edu 
                      UUCP    : ...!uunet!plains!bakke
           BITNET  : bakke@plains.bitnet  

jc@atcmp.nl (Jan Christiaan van Winkel) (10/12/90)

From article <661@modus.sublink.ORG>, by luke@modus.sublink.ORG (Luciano Mannucci):
| In article <1756@dinl.mmc.UUCP>, noren@dinl.uucp (Charles Noren) writes:
|> via TCP/IP.  Our question, is there any program in Netland
|> that converts back and forth between EBCDIC and ASCII
| 
| static char _tob[] = {
| 0x00,0x01,0x02,0x03,0x37,0x2d,0x2e,0x2f,0x16,0x05,0x25,0x0b,0x0c,0x0d,0x0e,0x0f,
.
.
| 0x97,0x98,0x99,0xa2,0xa3,0xa4,0xa5,0xa6,0xa7,0xa8,0xa9,0xc0,0x6a,0xd0,0xa1,0x07,
| };

| static char _toa[] = {
| 0x00,0x01,0x02,0x03,0x00,0x09,0x00,0x7f,0x00,0x00,0x00,0x0b,0x0c,0x0d,0x0e,0x0f,
.
.
| 0x30,0x31,0x32,0x33,0x34,0x35,0x36,0x37,0x38,0x39,0x00,0x00,0x00,0x00,0x00,0x00,
| };
| 
| char btoa(c)
| char c;
| {
| 	if (c < 0) return -1;
| 	return _toa[c];
| }
This is *THE* case where you need unsigned char's in stead of plain char's
The problem is that ebcdic uses the full range from 0 to 255 for it's char's.
for example the ebcdic code for '3' is 0xf3. On a machine
that uses signed characters for plain char's, the number in c will be inter-
preted as a negative number fooling your test in btoa...
JC
-- 
___  __  ____________________________________________________________________
   |/  \   Jan Christiaan van Winkel      Tel: +31 80 566880  jc@atcmp.nl
   |       AT Computing   P.O. Box 1428   6501 BK Nijmegen    The Netherlands
__/ \__/ ____________________________________________________________________

adw@otter.hpl.hp.com (Dave Wells) (10/16/90)

Charles Noren at Martin Marietta I&CS, Denver CO:

>We are communicating between Sun 3 (with SunOS 4.0.3) and an
>IBM Mainframe (don't know the model, we're not IBM jocks)
>via TCP/IP.  Our question, is there any program in Netland
>that converts back and forth between EBCDIC and ASCII
>(preferrably in C, but we will take any example)?

Jonathan I. Kamens at Massachusetts Institute of Technology:

|  The Unix program "dd" does this.  In particular, the "conv=ascii" option
|converts EBCDIC to ASCII, and the "conv=ebcdic" option goes the other way.
|  See the man page for more information.

It's particularly worth noting this (from that dd man page):

          ASCII and EBCDIC conversion tables are taken from the 256-
          character ACM standard, Nov, 1968.  The ibm conversion,
          while less widely accepted as a standard, corresponds better
          to certain IBM print train conventions.  There is no
                                                   ^^^^^^^^^^^
          universal solution.
          ^^^^^^^^^^^^^^^^^^^

If you're translating "ordinary text files", dd will probably do the trick.
If you're hoping to translate files or streams containing "unusual"
characters (e.g. control codes for a graphics terminal), the exact
translation table may well vary on a per-site basis.

Dave Wells

johncore@compnect.UUCP (John Core ) (10/18/90)

why write code do convert ascii to EBCDIC. it comes with
Unix. it's called dd


Wizard Systems              |    UUCP:   uunet!wa3wbu!compnect!johncore
P.O. Box 6269               |INTERNET:   johncore@compnect.wa3wbu
Harrisburg, Pa. 17112-6269  |a public bbs since 1978. Data(717)657-4992 & 4997
John Core, SYSOP            |-------------------------------------------------
----------------------------| No matter where you go, there you are!
a woman is just a woman, but a good cigar is a smoke.   -R. Kipling

exspes@gdr.bath.ac.uk (P E Smee) (10/22/90)

In article <28020001@otter.hpl.hp.com> adw@otter.hpl.hp.com (Dave Wells) writes:
>Jonathan I. Kamens at Massachusetts Institute of Technology:
>
>|  The Unix program "dd" does this.  In particular, the "conv=ascii" option
>|converts EBCDIC to ASCII, and the "conv=ebcdic" option goes the other way.
>|  See the man page for more information.
>
>It's particularly worth noting this (from that dd man page):
>
>          ASCII and EBCDIC conversion tables are taken from the 256-
>          character ACM standard, Nov, 1968.  The ibm conversion,
>          while less widely accepted as a standard, corresponds better
>          to certain IBM print train conventions.  There is no
>                                                   ^^^^^^^^^^^
>          universal solution.
>          ^^^^^^^^^^^^^^^^^^^
>
>If you're translating "ordinary text files", dd will probably do the trick.
>If you're hoping to translate files or streams containing "unusual"
>characters (e.g. control codes for a graphics terminal), the exact
>translation table may well vary on a per-site basis.

Actually, it *can* be even worse than this, and you don't need to get
into very complicated characters.  Even 'ordinary text' can pose
problems.  Under VM/CMS (for example) there are at least 3 possible
EBCDIC mappings for the square brackets [ and ].  Which you need may
vary not only per-site, but even according to which package you used to
produce the file on a single machine.  Since commercial packages tend
to arrive 'object only', there's not even much you can do about it.

There are other similar problem characters.  []'s come instantly to
mind as a result of having spent some time trying to move a portable C
program into EBCDIC.  Generally, the problem is that such characters do
not exist in 'formal' EBCDIC; but do exist (with varying codings) on
different IBM printer belts.  As a pragmatic solution, package writers
have used the printer belt codes for them; and it appears that their
results vary depending on which belts (and printer models) their
development machine had.

-- 
Paul Smee, Computing Service, University of Bristol, Bristol BS8 1UD, UK
 P.Smee@bristol.ac.uk - ..!uunet!ukc!bsmail!p.smee - Tel +44 272 303132

meissner@osf.org (Michael Meissner) (10/25/90)

In article <831@compnect.UUCP> johncore@compnect.UUCP (John Core )
writes:

| why write code do convert ascii to EBCDIC. it comes with
| Unix. it's called dd

However, unlike say ISO646 or ASCII, there is no one standard EBCDIC.
There are various EBCDIC's which share a lot of characters in common,
but have some different translations.  Also, you have the fun in most
EBCDIC's in that there are two representatons for '[' and ']'.  One
that your printer will print, and one that your terminal will display
correctly.
--
Michael Meissner	email: meissner@osf.org		phone: 617-621-8861
Open Software Foundation, 11 Cambridge Center, Cambridge, MA, 02142

Do apple growers tell their kids money doesn't grow on bushes?

schafer@devils.rice.edu (Richard A. Schafer) (10/26/90)

In article <MEISSNER.90Oct24150859@osf.osf.org>, meissner@osf.org
(Michael Meissner) writes:
||> However, unlike say ISO646 or ASCII, there is no one standard
EBCDIC.
To be fair, there is no *one* standard ASCII, either, if you consider
ASCII to include any of the several European versions of ASCII available
with ISO numbers which I don't remember off the top of my head.  That's
why ISO has been spending so much time over the past several years
working up new code point standards.

ok@goanna.cs.rmit.oz.au (Richard A. O'Keefe) (10/26/90)

In article <1990Oct25.140442@devils.rice.edu>, schafer@devils.rice.edu (Richard A. Schafer) writes:
> In article <MEISSNER.90Oct24150859@osf.osf.org>, meissner@osf.org
> (Michael Meissner) writes:
> ||> However, unlike say ISO646 or ASCII, there is no one standard
> EBCDIC.
> To be fair, there is no *one* standard ASCII, either

There is one and only one ASCII (well, there was an old version, but
there has been only one for many years).  ASCII stands for
	*AMERICAN* Standard Code for Information Interchange.
ASCII is one particular natiaonal variant of the ISO 646 standard.
The European versions of ISO 646 aren't versions of ASCII.

The new standard (ISO 8859) is a family of 8-bit codes.  Every member
of the family has the same graphic characters in the lower half
(32..126) as ASCII; this is a compatible extension of ISO 646.

If you want to draw a parallel between ISO 646 and EBCDIC, there
are *lots* of versions of EBCDIC.  There's a French one and a Spanish
one and a Hebrew one and ...  Undeniably commendable.  The thing
that people complain about is having several incompatible versions
within the same "locale" (to use an ANSI-C-ism).
-- 
Fear most of all to be in error.	-- Kierkegaard, quoting Socrates.

henry@zoo.toronto.edu (Henry Spencer) (10/26/90)

In article <1990Oct25.140442@devils.rice.edu> schafer@devils.rice.edu (Richard A. Schafer) writes:
>||> However, unlike say ISO646 or ASCII, there is no one standard
>EBCDIC.
>To be fair, there is no *one* standard ASCII, either, if you consider
>ASCII to include any of the several European versions of ASCII...

There are no, repeat *no*, European versions of ASCII.  ASCII is a single
precisely-specified character code with no versions or ambiguities.  It
is one of a family of codes derived from ISO646.  There are a number of
other 646-derived codes in use in Europe; they are not ASCII.

It is true that the existence of a variety of 7-bit codes has turned out
to be a major nuisance, which is why there has been considerable work on
unified codes like ISO Latin.
-- 
The type syntax for C is essentially   | Henry Spencer at U of Toronto Zoology
unparsable.             --Rob Pike     |  henry@zoo.toronto.edu   utzoo!henry