[net.nlang] troff special chars - naming them

aeb@mcvax.UUCP (Andries Brouwer) (07/26/85)

Last time I just mentioned a few accents that occurred to me while
writing - let me now give a more detailed overview of what accents
exist.

1. Accents on top

- Acute accent (') occurs on top of almost anything; many languages
  have 'a 'e 'i 'o 'u ; Icelandic also 'y ; Slovak also 'y 'r 'l ;
  Polish also 'c 'n 's 'z ; Latvian has a character that is sometimes
  printed as 'g (see below); etc.
  Note that the ' on 'a has not the same slope as the ' on 'i .

- Grave accent (`) occurs in many languages in `a `e `i `o `u ;
  Slovene `r

- Circumflex (^) occurs in many languages in ^a ^e ^i ^o ^u ;
  Esperanto has ^c ^g ^h ^j ^s ; accented Latvian has ^l .

- Trema/Diaeresis/Umlaut (::/") occurs as umlaut in many languages in
  "a "o "u (e.g. German, Slovak, Finnish, Swedish, Turkish, Hungarian);
  as trema in ::a ::e ::i ::o ::u .

- Hacek (h\'a\vcek) (v) occurs in many Slavic languages; Czech has
  ve vc vn vs vr vz ; Slovak also vD ; Esperanto vu .
  In transcriptions one meets other letters with hacek, e.g. Armenian vj .
- When the letter that should get the hacek is tall, then it gets a
  comma at the upper right instead: Czech has ,d ,t ; Slovak also ,l .

- Dot above (:) occurs in various places; the most obvious ones are
  :z in Polish and :e in Lithuanian, but I found it also e.g. as :n
  in the African language Bamoum.

- Macron (overline) (-) occurs as -a -e -i in Latvian, as -u in Lithuanian
  and is otherwise generally used to denote the length of vowels.

- Corona (circle above) (o) is found in Scandinavian oa and Czech ou .

- Tilde (~) is found in Spanish ~n , Portuguese ~a ~o and otherwise e.g.
  in accented Baltic languages: ~a ~e ~i ~o ~y ~m ~n ~l ~r ~.e .

- Breve (half circle above) (U) is found in Rumanian Ua , Turkish Ug ,
  Vietnamese Ua and is otherwise generally used to denote short vowels.

- Double acute ('') is found in Hungarian ''o and ''u .

- High tone mark (question mark without dot) (?) is found in Vietnamese
  ?a ?o ?u .

- In Latvian the palatalized sounds have a comma below, as we shall see,
  but in ,g there is no room for the , to go below, and one finds it on
  top instead. I have met three variations: 'g (acute accent), ,g (high
  centered comma) and I,g (high centered inverted comma).
  Sometimes the high centered inverted comma is met in other places; I
  have seen I,k and I,t in transliterated Armenian and I,p in Sorbian.

- In old Croatic texts one finds the double grave accent (``) as in
  ``a ``e ``i ``r .


  
2. Accents below

- Cedille (,) or left hook occurs in French ,c ; in Turkish ,s ;
  in Rumanian ,s ,t ; in Latvian ,k ,l ,n ,r (and ,K ,L ,N ,R ,G - for ,g
  see above).
  These hooks do not always resemble a comma.

- Rude (L) or right hook occurs in Polish La Le ; in Thai and old Norse Lo ;
  in Lithuanian La Li Lu ; in old Latvian Le Lk .
  These hooks start right from the center, sometime almost at the center,
  sometimes at the lower right hand corner.

- Dot below (.) occurs in Vietnamese .a .e .o ; in transliterations from
  Arabic or Sanskrit one meets .d .t .s .r .h etc.

- Corona below (0) occurs in transliterations, often to indicate that a
  sonorant has syllabic value: 0m 0n 0l 0r 0s .

- Breve below (u) occurs in transliteration of Sanskrit and Hittite uh .

- Double dot below (..) seems to occur in transliterated Urdu ..t .

- Vertical bar below (|) seems to occur in Yoruba |o .

- Circumflex below (A) seems to occur in Bamileki and Venda Ae .


3. Accents on more than one letter simultaneously

- An arc on top may join two letters, like in the transliteration of
  the Russian "relected R" as IU{ia} .

- In Tagalog occurs a tilde on the ng digraph: ~{ng} .

- Underline (_) is often used to indicate that two letters transliterate
  one sound, e.g. in various Indian languages _{kh} .

- Similarly the double underline (=) is sometimes used when the combination
  of two letters stands must represent two distinct sounds, e.g. Urdu ={gh} .
  (See also the ligature above.)


Note that I do not propose a naming scheme for accented symbols here - the
chosen denotations are purely ad hoc. Simple schemes as discussed earlier
almost always work, but fail when one letter carries several diacritical
marks. In Vietnamese one finds letters with acute and circumflex
side by side (so that it looks like a rotated 'less than or equals' sign):
{'^}a {'^}e {'^}o and towers like '^o ^a. ?Ua ~^e (read from top to bottom).
In Lithuanian one meets ~.e ~u, {.'}e '-u etc.
Clearly, when symbols can have three or more accents in various mutual
positions then some nontrivial grammar is needed to describe the situation.




4. Special symbols

Various ligatures are conventionally treated as a single symbol.
One has Dutch ij , German ss (or sz), French oe and
Scandinavian (and Latin) ae .

Turkish has dotless i (.i).

Icelandic has the thorn (bp) or (th).

Some symbols with a crossbar are
Polish /l and /L ; Scandinavian /o and /O ; Vietnamese and Yugoslavian
and Icelandic -d and -D ; Icelandic +d (eth).



Well, this is what I have found so far. The places where I said
"seems to occur" the information is quoted from an old draft version
of ISO standard ISO 5426 (dated 1975-07-10).
I would be thankful if people mailed me their additions and corrections.

irenas@tekig4.UUCP (Irena Sifrar) (08/01/85)

Andries Brouwer writes:
>1. Accents on top
>
>- Grave accent (`) occurs in many languages in `a `e `i `o `u ;
>  Slovene `r
>
I have never seen `r in Slovene.  There are no accents on Slovene letters
except when you want to denote the stress (mostly only dictionary
use).  In a way "r" can be one of the stressed letters, as in "mrtev",
but the word is actually pronounced [mer'tev], so the accent actually
falls on the implicit e (sounds like "a" in English, not like "ei").
I'd really like to see some examples of `r, if there are any.

Actually, Slovene does have three occurrences of accent that just have
to be there: hacek on top of c, s, z.  Even if c, s, or z are capitalized,
the hacek remains itself. (see below)

>- When the letter that should get the hacek is tall, then it gets a
>  comma at the upper right instead: Czech has ,d ,t ; Slovak also ,l .
>

>4. Special symbols
>
>Some symbols with a crossbar are
>Polish /l and /L ; Scandinavian /o and /O ; Vietnamese and Yugoslavian
>and Icelandic -d and -D ; Icelandic +d (eth).
>
There is no such language as Yugoslavian.  There is Macedonian,
Serbo-Croatian (slight differences between the two), and Slovene. 
Serbo-Croatian is the most common, even the Macedonians and the Slovenes
can speak in it.  The language of the government is usually Serbo-Croatian,
though at the assemblies people can talk in any of the three above 
mentioned languages.
			Irena Sifrar