[net.misc] Standard Cyrillic/ASCII mapping?

sdyer@bbncca.ARPA (Steve Dyer) (02/17/84)

A friend of mine is beginning to look at computer-aided analysis
of versification in Russian poets, and one of the first issues
to arise is whether there is any standard way to represent the
Cyrillic alphabet in 7 or 8-bit bytes (would this be called
RuSKII?)  It would be easy to assume some arbitrary mapping, but
it seems preferable to hew to any standard, if it exists.

Thanks,
-- 
/Steve Dyer
{decvax,linus,ima}!bbncca!sdyer
sdyer@bbncca.ARPA

sdyer@bbncca.ARPA (02/17/84)

Relay-Version: version B 2.10.1 6/24/83; site akgua.UUCP
Posting-Version: version B 2.10 5/3/83; site bbncca.ARPA
Path: akgua!clyde!floyd!harpo!decvax!bbncca!sdyer
Message-ID: <589@bbncca.ARPA>
Date: Fri, 17-Feb-84 02:30:50 EST
Date-Received: Fri, 17-Feb-84 17:34:49 EST
Organization: Bolt, Beranek and Newman, Cambridge, Ma.

A friend of mine is beginning to look at computer-aided analysis
of versification in Russian poets, and one of the first issues
to arise is whether there is any standard way to represent the
Cyrillic alphabet in 7 or 8-bit bytes (would this be called
RuSKII?)  It would be easy to assume some arbitrary mapping, but
it seems preferable to hew to any standard, if it exists.

Thanks,
--
/Steve Dyer
{decvax,linus,ima}!bbncca!sdyer
sdyer@bbncca.ARPA

colonel@sunybcs.UUCP (George Sicherman) (02/20/84)

This line gets eaten by a w>@=nnnn---*

I don't know of a Russian alphabet mapping for ASCII, but IBM designed
one for EBCDIC.  If you don't mind forgoing lower-case, you can compose
theirs with one of the accepted ASCII-EBCDIC maps.

00    space space   10    &     K       20    -     -       30    0     0
01    A     A       11    J     L       21    /     /       31    1     1
02    B     B       12    K     M       22    S     F       32    2     2
03    C     V       13    L     N       23    T     KH      33    3     3
04    D     G       14    M     O       24    U     TS      34    4     4
05    E     D       15    N     P       25    V     CH      35    5     5
06    F     E       16    O     R       26    W     SH      36    6     6
07    G     ZH      17    P     S       27    X     SHCH    37    7     7
08    H     Z       18    Q     T       28    Y     Y       38    8     8
09    I     I       19    R     U       29    Z     M.Z.    39    9     9

0B    .     .       1B    $     LOZ.    2B    ,     ,       3B    #     YU
0C    <     I KR.   1C    *     *       2C    %     E OBO.  3C    @     YA

0E    +     +

Abbreviations:  I KR. is short I (I with a breve); E OBO. is backwards E;
M.Z. is Soft Sign; LOZ. is the lozenge symbol from the old IBM commercial
BCD character set (sort of a concave-sided square).  No Hard Sign or pre-
revolutionary letters.

Alternative:  adapt Russian Morse Code, which uses a more natural cor-
respondence (with International Morse).

		Col. G. L. Sicherman
		...seismo!rochester!rocksvax!sunybcs!colonel

russell@cmcl2.UUCP (02/22/84)

#R:bbncca:-58900:cmcl2:4300002:000:837
cmcl2!russell    Feb 21 17:49:00 1984

I have a very old chart from Honeywell that lists the following USSR standard
for the representation of characer data in 8-bits (as a sidebar):

GOST 13052-67 defines the USSR set, shown in the lower row entry position,
of columns 12-15.  Actually, the standard defines the characters for columns
4-7 of a 7-bit set (SO=Russian register, SI=Latin register).  Columns 8-11
are idential o 0-3.

The chart has a little box that reads:

Reprints of this chart are available from the Honeywell Computer Journal
(P.O. Box 6000, Phoenix, AX 85005) at $1 each postpaid.

I imagine that the price (if available at all) has gone up.  The chart has no
date on it, but the latest ANSI standard mentioned is dated 1970.  I would 
guess that you could write a letter to the Russian Trade Mission in NY or
Washington and get the appropriate documents.

-- 

	Bill Russell		UUCP:		...!floyd!cmcl2!russell
	(212) 460-7292		InterNet:	Russell@NYU.ARPA

ljdickey@watmath.UUCP (Lee Dickey) (03/19/84)

ECMA (European Computer Manufacturers Association) acts as the
registrar for the International Standards Organization for the
purpose of keeping the registry of character sets.
There is a Cyrillic set and a Cyrillic Extension.

-- 
  Lee Dickey, University of Waterloo.  (ljdickey@watmath.UUCP)
                      ...!allegra!watmath!ljdickey
                ...!ucbvax!decvax!watmath!ljdickey