[net.nlang.greek] LCG: Latin-Coded Greek

kateveni@ariadne.UUCP (11/09/85)

Mias kai ypirxan prosfata ena swro minimata sto diktyo gyrw apo
to thema  tou "Greek Transliteration Standard & Programs", stelnw
kai 'gw simera ti deuteri (newteri) morfi ekeinou pou eicha proteinei
stin archi tou "diktyou" mas, me to onoma "grk" tote.  Prokeitai gia
ena "fwnitiko" standard.  Oi kyries allages apo tin prwti morfi einai
i anaparastasi twn tonwn, i dynatotita eukolis allagis target-codes,
kai i yparxi enos antistrofou filtrou gia merikous target-codes.
Akolouthei to "nroff'ed" manual s' auto edw to mynima, enw sto epomeno
mynima stelnw ena shell-script pou dimiourgei tous deka fakellous
tou source-code.

Chairetismous,
Manolis Katevenis
Institouto Pliroforikis, Ereunitiko Kentro Kritis
Hrakleio, Kriti.



LCG(L)              UNIX Programmer's Manual               LCG(L)



NAME
     LCG  -  Latin-Coded Greek notation and filters
     *lcg*  -  filters for conversion to and from LCG format

SYNOPSIS
     [ lcg2qtroff, lcg2vtroff, lcg2itroff, lcg2pc, lcg2pchex ]  [
     files ] .....
     [ pc2lcg, pchex2lcg ]  [ files ] ...
     [ bin2hex, hex2bin ]

DESCRIPTION
     LCG (``Latin-Coded Greek'') is a notation for writing Greek
     text using Latin characters, in a ``phonetic'' fashion.  The
     definition of LCG is biased towards it being used as troff
     input.

     Lcg2qtroff, lcg2vtroff and lcg2itroff are pre-troff filters
     that take LCG text and convert it into the corresponding
     escape-character sequences for the greek letters on the
     ``special'' font of troff.

     Thus, for example, the input:
         English
         .G
         ElliinikA
     generates the output:
         English
         E\(*l\(*l\(*y\(*n\(*i\(*k
            \v'-0.1m'\h'0.32m'\z\'\h'-0.32m'\v'0.1m'\(*a
     (where the last line generates an alpha with accent, and is,
     in reality, a continuation of its previous line with no
     new-line in between).

     LCG follows the ``monotoniko'' (single-accent) system.

     When invoked with no arguments, the lcg2* filters read from
     standard input.  When invoked with argumnets, they considers
     them to be file names, and they read those files as input,
     in the sequence in which they are given.  All programms men-
     tioned here send their output to the standard output.  Thus,
     typical uses may be as follows:
         lcg2vtroff textfile1 textfile2 | vtroff -ms
         lcg2vtroff textfile1 f2 f3 xyz | tbl | eqn | vtroff -ms

     Lcg2qtroff has been optimized for the QMS "LASERGRAFIX"
     printer (using qtroff).  Lcg2vtroff is the same filter,
     except that it is adjusted for the Varian Electrostatic
     Plotter (using vtroff) (the accent marks must be adjusted
     differently due to the different character heights and
     widths).  And lcg2itroff is again the same filter, adjusted
     for the Imagen Laser Printer (using itroff) (that printer
     has no terminal-sigma character!).



Printed 11/9/85            August 1984                          1






LCG(L)              UNIX Programmer's Manual               LCG(L)



     Lcg2pc is a filter that converts LCG text into ``extended-
     ASCII'' text for the IBM-personal-computers (pc) that are
     being sold in Greece (the ones that are available in the
     Cretan Research Center).  Pc2lcg is the inverse filter.

     The source of these filters is organized in such a way that
     it is easy to define new codes and to compile the
     corresponding filters: use the files code.*.h in the
     source-directory, and in particular the file code.guide.h .
     The inverse filters only work when the code for each Greek
     or Latin letter is just a single byte (may be a full-8-bit
     byte).

     Two additional filters are provided for the communication
     between a VAX-UNIX and an IBM-pc.  Because that communica-
     tion uses 7-bit bytes, the filters bin2hex and hex2bin can
     be used to convert between a full-8-bit-byte representation
     (bin) and a hexadecimal representation (hex) where each ori-
     ginal byte is represented as a two-digit (two-byte) hexade-
     cimal number.  The filters lcg2pchex and pchex2lcg are sim-
     ple shell-scripts that specify pipe connections between
     lcg2pc and bin2hex on one hand, and hex2bin and pc2lcg on
     the other hand.


GREEK/LATIN (CONVERT/NO-CONVERT) MODES
     During its operation, the LCG scanner can be in one of two
     possible modes:
         L         Latin-mode      copy input to output
         G         Greek-mode      convert input to output
     When in Latin mode, it copies its input -- unchanged -- to
     the standard output.  When in greek-mode, it treats its
     input as greek text writen with latin characters, parses it
     according to the lexical rules given below, and sends the
     corresponding troff escape-sequences to the standard output.
     The only exceptions are:
     (1) The lcg commands for mode/font change (see below).
     (2) Other lines that begin with a dot (period, ``.'') as
     their first character (troff commands) are copied unchanged
     to the standard output, regardless of the mode in which lcg
     is.

     The LCG scanner starts executing in the _L_a_t_i_n mode.  Some
     specific character sequences in the input stream are recog-
     nized as commands to the lcg scanner, for it to change mode.
     When lcg2* read their input from multiple files, the mode
     that is in effect at the end of a file is the mode in which
     the next file starts being read.  The commands to change
     mode are shown below, together with their effect as well as
     the output which they generate.

         INPUT      .ft G          .G        \fG



Printed 11/9/85            August 1984                          2






LCG(L)              UNIX Programmer's Manual               LCG(L)



         EFFECT    change to Greek-mode
         OUTPUT     none

         INPUT      .ft L          .L        \fL
         EFFECT     change to Latin-mode
         OUTPUT     none

         INPUT      .ft R          .R        \fR
                    .ft B          .B        \fB
                    .ft I          .I        \fI
         EFFECT     change to Latin-mode
         OUTPUT     echo input to output

         INPUT      .ft P          .ft       \fP
         EFFECT and OUTPUT:
          Restore the previous mode/font: If the current mode is
          Greek, and if the last mode (until the last mode/font
          change) was Latin, then change to Latin mode and give
          no output.  If the current mode is Latin, then echo the
          input to the output (i.e. change to previous R/B/I
          font), and, in addition, if the last mode (until the
          last mode/font change) was Greek then change to Greek
          mode.

     These commands are patterned after the font-change commands
     of troff.  The ones that begin with a period must appear on
     a line by themselves, while the ones that begin with a
     back-slash can appear ``in-line'', just like in troff.

     When in Greek mode, the LCG scanner does not recognize any
     ``in-line'' troff commands other than the mode/font-change
     ones listed above.  If you need to use such commands, you
     should ``insulate'' them.  Example:
         kAti \fL\s+2\fG spoudaIo \fL\s-2\fG
     See the section ``BUGS'', for some more limitations of the
     LCG scanner.


LEXICAL RULES
     When in Greek mode, the LCG scanner parses its input into
     groups of 1, 2, 3, or 4 characters, according to the list of
     recognized patterns that is given below.  The _l_o_n_g_e_s_t pat-
     tern that matches the input at the current position is
     chosen and converted into the corresponding output pattern.
     Thus, for example, even though a ``t'' produces a ``tau''
     and an ``h'' produces an ``eta'' when by themselves, a
     ``th'' produces a ``theta''.  LCG uses some context sensi-
     tivity in the cases of sigma's and accents -- see the table
     below.

     The table with the recognized input patterns (and the alter-
     natives that some of them have) and the corresponding



Printed 11/9/85            August 1984                          3






LCG(L)              UNIX Programmer's Manual               LCG(L)



     interpretation follows:

       INPUT        (OR)      MEANING

       lower-case letters:

         a                    alfa (atono -- no accent)
         v          b         biita
         g                    gama
         d                    delta
         e                    epsilon (atono)
         z                    ziita
         ii         h         iita (atono)
         th                   thiita
         i                    iwta (atono)
         k                    kapa
         l                    lamda
         m                    mi
         n                    ni
         x                    xi (ksi, opws: xydi)
         o                    omikron (atono)
         p                    pi
         r                    rw
         s     [ followed by a,...,z,A,E,H,I,O,Y,U,W or ' -- but
     not '' ]
                              sigma
         s     [ followed by anything else, including '' ]
                              terminal-sigma
         t                    tau
         y          u         ypsilon (atono)
         f                    fi
         ch                   chi (opws: chioni)
         ps                   psi (opws: psari)
         w                    wmega (atono)

       upper-case letters (except for accents -- see below):

         A                    A (ATONO)
         B          V         BIITA
         G                    GAMA
         D                    DELTA
         E                    E (ATONO)
         Z                    Z
         II    Ii   H         H (ATONO)
         TH         Th        THIITA
         I                    IWTA (ATONO)
         K                    K
         L                    LAMDA
         M                    M
         N                    N
         X                    XI (KSI, OPWS: XYDI)
         O                    O (ATONO)



Printed 11/9/85            August 1984                          4






LCG(L)              UNIX Programmer's Manual               LCG(L)



         P                   PI
         R                    RW
         S                    SIGMA
         T                    T
         Y          U         YPSILON (ATONO)
         F                    FI
         CH         Ch        CHI (OPWS: CHIONI)
         PS         Ps        PSI (OPWS: PSARI)
         W                    WMEGA (ATONO)

       When immediately preceeded by a lower-case letter:

         A                    alfa tonos (accent)
         E                    epsilon tonos
         II    Ii   H         iita tonos
         I                    iwta tonos
         O                    omikron tonos
         Y          U         ypsilon tonos
         W                    wmega tonos

       Other accents:

         'a                   alfa tonos (accent)
         'e                   epsilon tonos
         'ii        'h        iita tonos
         'i                   iwta tonos
         'o                   omikron tonos
         'y         'u        ypsilon tonos
         'w                   wmega tonos

         'A                   ALFA TONOS
         'E                   EPSILON TONOS
         'II   'Ii  'H        IITA TONOS
         'I                   IWTA TONOS
         'O                   OMIKRON TONOS
         'Y         'U        YPSILON TONOS
         'W                   WMEGA TONOS

       Dialytika:

         :i:                  iwta dialytika
         :y:        :u:       ypsilon dialytika
         :'i:                 iwta tonos dialytika
         :'y:       :'u:      ypsilon tonos dialytika
         :I:                  IWTA DIALYTIKA
         :Y:        :U:       YPSILON DIALYTIKA


EXAMPLE
       .LP
       This is an example of \fBlcg\fR text.
       .G



Printed 11/9/85            August 1984                          5






LCG(L)              UNIX Programmer's Manual               LCG(L)



       .LP
       AutO eInai 'ena parAdeigma keimEnou \fBlcg\fP.
       .sp 3
       .ce 3
       SKOPOS TOY INSTITOYTOY PLIIROFORIKIIS
       TOY EREYNIITIKOY KENTROY KRIITIIS
       (apO to ProedrikO DiAtagma 'IdrysIIs tou)
       .PP
       SkopOs tou EreuniitikoU K'entrou KrIItiis eInai
       ('arthro 2)
       ``ii diexagwgII basikIIs, efarmosmEniis,
       kai technologikIIs 'ereunas,
       kai ii anAptyxii efarmogWn
       stous exIIs tomeIs technologiWn aichmIIs:....''
       .PP
       GiA to InstitoUto PliiroforikIIs
       ('arthro 3):
       ``... skopOs tou InstitoUtou autoU
       eInai ii 'ereuna, ii melEtii, kai ii ylopoIhsii
       systiimAtwn pliiroforikIIs
       pros 'ofelos tiis EthnikIIs OikonomIas
       kai tiis DiimOsias DioIkiisiis.''
       .L
       .sp 2
       .ce
       \l'6i'
       .sp 2
       .TS
       center,box;
       c s
       l|l.
       .G
       TechnikII OrologIa:
       _
       mikroepexergastIIs     \fLmicroprocessor\fG
       olokliirwmEno kYklwma  \fLintegrated circuit\fG
       .TE
       .sp 3
       .LP
       EdW, s' autO to parAdeigma,
       'echoume 'ena sIgma m' apOstrofo,
       enW edW: ``autOs'' 'echoume 'ena sIgma
       amEsws prin apO eisagwgikA pou kleInoun.
       O sarwtIIs (\fLscanner\fG) giA \fBlcg\fP katalabaInei
     mOnos tou,
       schedOn pAnta,
       pOte to sIgma eInai ``mesaIo'' kai pOte eInai ``telikO''.
       .LP
       K'ati 'allo pou thElei eidikII prosochII:
       oi lExeis pistopoi\fGiitikO, no\fGiimosYnii thEloun mEsa
     tous
       'ena \fL\f\fLG\fG, an den thEloume na tis grApsoume:



Printed 11/9/85            August 1984                          6






LCG(L)              UNIX Programmer's Manual               LCG(L)



       "pistopoihtikO", "nohmosYnii".
       .sp
       T'elos tou paradeIgmatos.
       .L
       .br
       End of the example.


SEE ALSO
     lcg, troff, qtroff, vtroff, itroff, tbl, eqn


FILES
      /usr/src/local/lcg/*              sources
      /usr/src/local/lcg/code.*.h       definitions of codes
      /usr/src/local/lcg/code.guide.h   guide for new codes
      /usr/local/                  objects


AUTHOR
     Manolis G.H. Katevenis, Institute of Computer Science,
     Research Center of Crete, August 1984.


BUGS
     When in Greek mode, it does not recognize in-line troff com-
     mands (troff commands that begin with back-slash): it will
     convert them to greek, i.e. it will destroy them.  Excep-
     tion: the mode/font-change commands.

     It does not recognize input-file diversions with the com-
     mand:
                             .so filename

     Also, it does not recognize text intended for processing by
     EQN, neither the table-formatting instructions to TBL.
     Again, it will convert them to Greek, thus destroying them.

     It does not recognize the arguments of troff commands, like,
     for example:
         .ds LF "InstitoYto PliiroforikIIs KrIItiis"
     and thus, it will not transform them into Greek.

     The commands which ``restore the previous mode/font'', try
     to do what you would expect them to do, and also to leave
     Latin text that uses them (and was writen ignoring _l_c_g ) as
     unmodified as possible.  However, it is not clear that they
     succeed in doing so.  Also, they are not completely tested.

     It choses the wrong kind of sigma ("messaio" instead of
     "terminal") in the case of words that are truncated and a
     period is used to indicate that.  Example: "To mAthiima Fys.



Printed 11/9/85            August 1984                          7






LCG(L)              UNIX Programmer's Manual               LCG(L)



     IV ascholeItai me..." (anti "FysikII IV").

     Send other bugs to:
            ariadne!kateveni



















































Printed 11/9/85            August 1984                          8