[comp.std.internat] Computers and human languages

karl@haddock.ISC.COM (Karl Heuer) (01/01/70)

In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes:
>Just by curiosity, a quick scan of my brain seems to indicate that English
>would be the only European language not to use diacritical marks, digraphs,
>or extra letters. (? - I mean something like the dutch "ij").

Well, it used to be common in English to put a diaresis over the third letter
of "coordinate", for example; but that convention seems to be vanishing.  I
guess the closest thing in modern American English are the apostrophe and the
hyphen -- and we might be better off without those, too.

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

biep@cs.vu.nl (J. A. "Biep" Durieux) (01/01/70)

In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes:
>Just by curiosity, a quick scan of my brain seems to indicate that English
>would be the only European language not to use diacritical marks, digraphs,
>or extra letters. (? - I mean something like the Dutch "ij").

I do not completely understand what you mean by "extra letters", but the "ij"
is just the long "i" (not the way any of you foreigners pronounce it :-)).
The long "a" is written "aa", the long "e" "ee", "o" -> "oo", "u" -> "uu",
and "i" -> "ii". Now "ii" looks ugly, so the second "i" is drawn as to curl
under the first, and you have "ij" (Dutch typewriters have one key to print
it - is that what you mean by "extra letters"?

There having been a vowel shift in several parts of the Netherlands (some
people still pronounce "ij" as in "see" (but short)), most people pronounce
it what I think to be a unique way (I think there is no other language
with this sound): something between the sounds in "red" "wine". The sound
is clearly sliding (becoming a "y"-consonant, like "I"). Virtually the
same sound (in most of the Netherlands) as "ei".

But then, we have a lot of funny sounds in our language. Anybody knows a
language where the sound of "ui" exists? (Somewhat higher than the shifted
form of the vowel in "luck", and not to mix with the also present sound
"eu", which is like the sound in the German "schoen")
Kom kijken, er staat een kuiken in de keuken te koken!

P.S.: About extra letters: is the "$"-sign really the writing in one space
of "U" and "S"? So: "U.S. dollar" --> "$ dollar"
-- 
						Biep.  (biep@cs.vu.nl via mcvax)
			Hot girls like cool boys

andersa@kuling.UUCP (Anders Andersson) (01/01/70)

In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes:
>Just by curiosity, a quick scan of my brain seems to indicate that English
>would be the only European language not to use diacritical marks, digraphs,
>or extra letters. (? - I mean something like the dutch "ij").

As somebody has already mentioned, what's to be considered "diacritical" or
"extra letters" depends on what language you compare with. I consider the
Swedish circle above "a" to be no more "diacritical" than the common dot
above a lowercase "i" or "j". At least the Turkish provide a uniform set of
dotted and undotted "i" in both upper and lower case...

What about the origin of "w", was it known by Caesar or is it a more
modern invention? It seems pretty much like a double "v" ligature to me.
-- 
Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden
Phone: +46 18 183170
UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)

rob@pbhye.UUCP (Rob Bernardo) (01/01/70)

In article <480@kuling.UUCP> andersa@kuling.UUCP (Anders Andersson) writes:
Re: Russian
+ There are the hard and soft
+signs which have no phonetic value of their own, but they look like ordinary
+letters. I think it's funny that they are relevant in sorting, just as if the
+apostrophe should have it's own place in the English and French alphabets...

Actually, they are more analogous to the "h" in English when it combines
with "s", "t", etc. to denote a single sound (except that English "h" can
represent a sound by itself while the Russian hard and soft signs cannot).
(There are analogues in other languages, e.g. "h" in Portugues causing
a consonant to be palatal, silent "u" in Spanish and "h" in Italian  which
buffer "c" and "g" from a following "i" and "e".)

But while we're talking about alphabetic sorting, I should like to point
out that in Spanish "ch" is sorted after "c" and before "d". Similarly,
"ll" and "rr" immediately follow their unary counterparts. They are treated
as if single letters when reciting the alphabet, as is the non-ascii
single letter "n~".
+
+Anyway, it's true that accents don't play such a big role in Russian as they
+do in French or Czech. English is probably the least complicated in this
+matter, but I don't consider the English alphabet absolutely "pure" in some
+accentophobic sense. Every alphabet has its history of anomalies.
+
+Disclaimer: I'm NOT an educated linguist, just an amateur. Although I hold
+the above for true, any linguist could probably provide more detail.
+-- 
+Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden
+Phone: +46 18 183170
+UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)


-- 
I'm not a bug, I'm a feature.
Rob Bernardo, San Ramon, CA	(415) 823-2417	{pyramid|ihnp4|dual}!ptsfa!rob

apoorva@mind.UUCP (Apoorva Muralidhara) (01/01/70)

Posting-Front-End: GNU Emacs 18.40.2 of Thu Mar 19 1987 on mind (berkeley-unix)


Mark Towfigh writes:


*Unlike Arabic, Modern Persian is written with no accents.  Accents are
*only used in children's books and occasionally in print with a foreign
*word.  This would perhaps be analogous to similar occurences in English.

While I don't know Persian, I do know some Arabic.  Please explain
what you mean by "accents" when referring to Arabic.  From the
sentences quoted, it seems that you are referring to the signs in
vocalized text:  the vowels fathah, dammah, kasrah, their nunated
forms, and the no-vowel sign sukun.  They are, as far as I know, only
used in the Quran (because errors in vocalization might lead to
blasphemy), children's books, and books which teach Arabic to
non-native-speakers.  So I'm not sure what you mean by "unlike Arabic."

*Since the Persian alphabet has only 32 letters, moreover, that would
*leave at least 20 spaces for the 6-odd accents which are so
*infrequently used.

Arabic has 28 letters, or 29 if you count hamzah.  Well, okay, it has
31 if you count ta'-marbutah and alif maqsurah (I hope I haven't
forgotten anything), and 7 signs:  the three vowels, their nunated
versions, and sukun.  And Arabic doesn't have capital letters either.

Just in case Arabic ever becomes that popular . . .

				--Apoorva

[Now if you want an even neater-looking script, how about Kannada?
 A lot more letters though . . .]

andersa@kuling.UUCP (01/01/70)

In article <2739@husc6.UUCP> corelib@husc4.HARVARD.EDU (core library) writes:
>In fact, a Persian alphabet would be easier to implement than an
>English one (in terms of size) because there are 52 letters in the
>Enlish language:  capitals and lower case.  These have to be specified
>by the writer, while in Persian, the shape (or, loosely, case) of each
>letter is absolutely determined by whether it has a connecting letter
>or space before or after it.  

There are three forms of each letter, designed for the beginning of a word,
for the end of a word, and for within a word, right? I think the same is
true for Arabic. This seems similar to Latin hand-writing, in that there
are various ways to connect adjacent letters. I would like to know: Is it
ever acceptable to write a Persian word using only one of the three forms
(LIKE IT MAY BE ACCEPTABLE TO WRITE ENGLISH WORDS IN UPPERCASE ONLY), or
are the form rules mandatory (in the sense any aspect of a language can be
"mandatory")?

If the codes for different forms are the same, then the typography will
have to depend on context, so there has to be three font bitmaps, or types
on a printing wheel/chain, and a neat little algorithm instead of a 1-1
mapping for translating character codes into font table indices. I'm not
arguing against the solution, just pointing out some extra problems.

This leads me to another question: Which form is used in a Persian word
consisting of only one letter?
-- 
Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden
Phone: +46 18 183170
UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)

franka@mmintl.UUCP (Frank Adams) (01/01/70)

>>I was told once (by a respected linguist, as I recall) that English and
>>Russian are the ONLY two languages written with unaccented alphabets.

The Russian alphabet does have two accented letters, although the accents
are often omitted.  How common this omission is, or whether it is getting
more common, I don't really know.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

eric@snark.UUCP (01/01/70)

So would *someone* who knows quit tantalizing us and *translate* 'gezellig'?

I'm curious about both the 'literal' (presumably etymology-based) translation
and whatever paragraph of circumlocutions is necessary to express the concept.

Inspection of various possible cognates (notably German ge + selig) suggests
a guess-translation of "wisdom-struck" to this amateur linguist; some sort of
metaphorical inversion (as with English "silly") seems not unlikely, yielding
something like "foolish" or the idiomatic "loopy". Now, how far off base am I?

I'll contribute an example from the days when I spoke reasonable Italian.
Er, make that "Tuscan"; I lived in Rome for two years but found out the
hard way when visiting Naples and Sicily that different 'dialects' of Italian
can be mutually utterly incomprehensible. "A language is a dialect with an
army" --and the Tuscan dialect of Rome and environs is the "national" language
of Italy.

When translating the italian term 'simpatico', you can choose the etymological
cognate "sympathetic" in English, or you can translate the actual concept --
but not both.

In English, "X is sympathetic to Y" != "Y is sympathetic to X"; the term
implies an unequal relationship in which one party accepts "sympathy" and
potential help from the other.

In Italian, "simpatico" is a fellow-feeling between equals. You say "those
two are simpatico", or "we are simpatico". "Those two are compatible" comes
close, but doesn't convey the mild but definite emotional warmth of the
Italian.

There is no really good translation in English, which is why some English
speakers have naturalized 'simpatico' in its Italian form.








-- 
      Eric S. Raymond
      UUCP:  {{seismo,ihnp4,rutgers}!cbmvax,sdcrdcf!burdvax,vu-vlsi}!snark!eric
      Post:  22 South Warren Avenue, Malvern, PA 19355    Phone: (215)-296-5718

eal@tut.UUCP (Lehtim{ki Erkki) (01/01/70)

In article <7187@reed.UUCP> eeyore@reed.UUCP (joshua samuel honig guenter ii) writes:
>question 2. does anybody know if finnish is written in cyrillic in the karelian
>s.f.s.r.?

They don't actually speak finnish in the Karelian s.f.s.r., i think they
speak some of the karelian dialects, East-Karelian or Ruski-Karelian,
i am not sure. But if you want, i can check it. But use e-mail, i seldom
read this group.

-- 
Erkki A. Lehtim{ki        eal@tut.uucp

gordan@maccs.UUCP (Gordan Palameta) (08/17/87)

In article <479@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes:
>In Japan programming languages are the least of the problems their written
>language causes them. An incredible amount of data is never stored anywhere
>but on the original form, photocopies of said form, or faxed copies of said
>form. Even with the best tools available it's just too hard to keypunch.

Everything you mention above was true of English, less than ten years ago.
It's easy to forget just how recent a development the personal computer is,
without which "office automation" would be far less practical.

>This, of course, makes it even more amazing that they have been so succesful
>in the world community. It seems likely to me, though, that at some point
>they're going to have to break down and drop Kanji for professional use.

Not very likely.  The whole point of using computers is that they should be
adapted to serve us, not the other way around.

French and German did not adapt to computers by dropping cedillas, umlauts,
and accents; instead we now have ISO Latin.  Arabic has not adapted to
computers by simplifying the calligraphic qualities of its script; instead
sophisticated software is being used to properly display Arabic without
sacrificing its aesthetic qualities (see the latest issue of Communications
of the ACM).  And Japanese will one day be fully accomodated by computers;
enormous progress has already been made towards this end.  Japanese without
kanji would not be Japanese.

To look at issues such as this from a different perspective, consider English.
English uses a highly non-phonetic script; the illiteracy rate in the U.S. is
at alarming levels.  This might be a non-sequitur, but undoubtedly a phonetic
script for English would make life a lot simpler.  Will it ever happen?
Not a chance. 

I'm not sure if this discussion belongs in comp.std.internat, so I
cross-posted to sci.lang and removed comp.lang.c.  Follow-up wherever you
see fit.

-- 
UUCP:  ... !mnetor!lsuc!maccs!gordan              BITNET: GP@TANDEM
"Sumasshedshii vsekh stran, soyedinyaites'"        Gordan Palameta

lamy@utegc.UUCP (08/17/87)

In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes:
>French and German did not adapt to computers by dropping cedillas, umlauts,
>and accents; instead we now have ISO Latin.  Arabic has not adapted to

I beg to differ.  I have a cedilla in my name, and I can tell you that I have
not seen it appear very often in computer output in the last 25 years.  And
that (was) in a part of the world that tries to be officially French.

I used to use a CDC Cyber with a "shell" custom built to handle French.  The
system evolved over about 10 years, but that was only possible because the
machine was so crippled in the first place (6 bit chars, no text processing
utilities) that everything  was made from scratch (even the Pascal compiler
accepted accented identifiers).

I find it somewhat ironic that the recently version of "ngrep" moves toward
internationalization by trying to accomodate Japanese.  I am quite confident
that we will see a Japanese version of Unix before a French or a German one.

Just by curiosity, a quick scan of my brain seems to indicate that English
would be the only European language not to use diacritical marks, digraphs,
or extra letters. (? - I mean something like the dutch "ij").

Jean-Francois Lamy                      lamy@ai.toronto.edu (CSnet,UUCP,Bitnet)
AI Group, Dept of Computer Science      lamy@ai.toronto.cdn (EAN X.400)
University of Toronto, Canada M5S 1A4   {seismo,watmath}!ai.toronto.edu!lamy

roy@phri.UUCP (Roy Smith) (08/17/87)

In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes:
> English uses a highly non-phonetic script; the illiteracy rate in the
> U.S. is at alarming levels.  This might be a non-sequitur, but
> undoubtedly a phonetic script for English would make life a lot simpler.
> Will it ever happen?  Not a chance.

	Well, maybe a small chance.  When I was in, I think, the first
grade (in New York City, about 1965) they tried out an experimental reading
and writing system on us.  We were taught a phonetic alphabet.  All I
really really remember about it was that letters like "c" which admitted to
two pronunciation were banned -- you wrote "kalsify" instead of
"calcify", and that they added a schwa to the alphabet.  Schwa, written
like an upside-down "e" was some sort of vowel.  Dictionaries use it a lot
to show pronunciation.

	My mother is convinced that my poor spelling skills are a direct
effect of this phonetic writing experiment.  She's probably right.
-- 
Roy Smith, {allegra,cmcl2,philabs}!phri!roy
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

Isaac_K_Rabinovitch@cup.portal.com (08/18/87)

Before we give the Japanese too much credit for becoming an advanced
technical society in spite of the limitations of Kanji, we should remember
one way in which we have fallen behind:  language.

In Japan, as in most countries, scientists, doctors, and engineers are
required to learn the language that is widely used for their discipline.
As in Europe, most Japanese physicians speak German, most computer scientists
speak English, etc.  The U.S. has lucked out, despite appalling poor language
instruction, simply because English happens to be a standard technical
language.  We don't seem to have made best use of this advantage.

jal@oliveb.UUCP (Tony Landells) (08/19/87)

In article <8708171253.AA21033@ephemeral.ai.toronto.edu>, lamy@ai.toronto.edu (Jean-Francois Lamy) writes:
> 
> I find it somewhat ironic that the recently version of "ngrep" moves toward
> internationalization by trying to accomodate Japanese.  I am quite confident
> that we will see a Japanese version of Unix before a French or a German one.
> 
I'm afraid I must beg to differ - AT&T currently have an internationalized
UNIX in beta-test (or, at least, the applications environment is in beta-
test - I assume it would make a completely nationalized system possible) with
French and German as the two possibilities currently available in Europe.  Of
course, you do need the hardware to support it (e.g. European ISO 8859/1 8-bit
graphical character set conforment).

I'm not sure how well it works yet, as it only arrived last week, and we're
currently in the process of installing it, but it does exist...

Tony Landells.
-- 
I don't have a .signature, but then I never did get the hang of .writing ...

alan@pdn.UUCP (Alan Lovejoy) (08/19/87)

In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes:
>"Sumasshedshii vsekh stran, soyedinyaites'"        Gordan Palameta

Shouldn't that be: Sumashedshchiji fsjekh stran, soyedjinjaitjesj 
((Let) loonies (out-of-mind-goners) (of) all countries unite (make
themselves (as) one))?  First of all, they already have.  Secondly,
ASCII is admittedly far underpowered to even be considered as a
standard for all languages world-wide, even if they all could socio-
politically agree to use the "same" alphabet.  Thirdly, the
Soviets already have RuSCII (Russian Standard Code For Information
Interchange).  And lastly, it's a lot easier to write "English" in
Cyrillic than it is to write "Russian" in Anglographia (excuse me for
coining a new term).

--Alan "Pjervyj bljin komom" Lovejoy

guy%gorodish@Sun.COM (Guy Harris) (08/19/87)

> > I find it somewhat ironic that the recently version of "ngrep" moves toward
> > internationalization by trying to accomodate Japanese.  I am quite confident
> > that we will see a Japanese version of Unix before a French or a German one.
> > 
> I'm afraid I must beg to differ - AT&T currently have an internationalized
> UNIX in beta-test (or, at least, the applications environment is in beta-
> test - I assume it would make a completely nationalized system possible) with
> French and German as the two possibilities currently available in Europe.

However, I believe they released the Japanese Applications Environment before
any European environments, so his contention still stands.
	Guy Harris
	{ihnp4, decvax, seismo, decwrl, ...}!sun!guy
	guy@sun.com

srg@quick.UUCP (Spencer Garrett) (08/20/87)

I was told once (by a respected linguist, as I recall) that English and
Russian are the ONLY two languages written with unaccented alphabets.  I
know you have to add the qualifier "modern" to make that true, and maybe
"major" as well, although I don't know of any exceptions right off.  I don't
know whether he didn't count Katakana and Hiragana as alphabets or whether
one cannot (or normally would not) write Japanese entirely in one or both
of these scripts.  He seemed to think that an unaccented alphabet was a
substantial advantage in an information age, and I would tend to agree.
(It should be noted in passing that this is not mere cultural imperialism.
This particular professor is extremely fond of Arabic script and has spent
a great deal of time teaching TEX to handle it.)

andersa@kuling.UUCP (Anders Andersson) (08/23/87)

In article <111@quick.UUCP> srg@quick.UUCP (Spencer Garrett) writes:
>I was told once (by a respected linguist, as I recall) that English and
>Russian are the ONLY two languages written with unaccented alphabets.  I

That depends on the definition of "accented". A Russian "J" is simply an "I"
(a reversed "N") with a kind of U-arc on top of it, and they are sorted the
same. "E" has an umlaut-like accent which changes the vowel from "ye" to
"yo", but this seldom shows up in print (except for dictionaries, which also
use common acute accent to show pronounciation). There are the hard and soft
signs which have no phonetic value of their own, but they look like ordinary
letters. I think it's funny that they are relevant in sorting, just as if the
apostrophe should have it's own place in the English and French alphabets...

Anyway, it's true that accents don't play such a big role in Russian as they
do in French or Czech. English is probably the least complicated in this
matter, but I don't consider the English alphabet absolutely "pure" in some
accentophobic sense. Every alphabet has its history of anomalies.

Disclaimer: I'm NOT an educated linguist, just an amateur. Although I hold
the above for true, any linguist could probably provide more detail.
-- 
Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden
Phone: +46 18 183170
UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)

neubauer@bsu-cs.UUCP (08/23/87)

In article <111@quick.UUCP>, srg@quick.UUCP (Spencer Garrett) writes:
> I was told once (by a respected linguist, as I recall) that English and
> Russian are the ONLY two languages written with unaccented alphabets.  I
> know you have to add the qualifier "modern" to make that true, and maybe
> "major" as well, although I don't know of any exceptions right off.  I don't
   ^^^^^ that might do it
> know whether he didn't count Katakana and Hiragana as alphabets or whether
	They are not alphabets, they are syllabaries, i.e., each symbol
	represents a whole syllable.
> one cannot (or normally would not) write Japanese entirely in one or both
> of these scripts.  He seemed to think that an unaccented alphabet was a
> substantial advantage in an information age, and I would tend to agree.
	So would I.

In article <2842@ulysses.homer.nj.att.com>, jss@hector..UUCP (Jerry Schwarz) writes:
> I quote from a draft of the Rationale of the proposed 
> ANSI C standard, section 4.4:
> 	The English language uses 26 letters derived from the
> 	Latin alphabet. The set of letters suffices for English, 
> 	Swahili, and Hawaiian; all other living languages use
> 	either the Latin aphabet plus other characters, or other 
> 	non Latin aphabets or syllabaries.
> They cite no reference for this piece of trivia.

Just as well, since it is not true.  Another counterexample (from off the
top of my head):  Hmong.  If necessary, we could undoubtedly come up with
more, but there is really no point.  We don't really need to worry about it
for C programs, since the characters needed for that are already known.
What we do need to worry about is how to set up computer facilities, e.g.,
keyboards, and how to represent the modified letters in languages that DO
have diacritics.  

It has already been established that simply using the high bit of an 8-bit
byte for +/- modified will not do, both because of multiple diacritics for a
single letter in a given language, and also because of multi-lingual text.
It is certainly far less elegant to simply assign a byte from the upper 1/2
of the byte range (i.e. with high bit set) to each known modified letter.
If we stick to the Latin alphabet, though, there are probably enough
unassigned bytes to do it.  That will leave very odd sets of bit patterns to
represent the letters of a given language, but the alternative would appear
to be to scrap ASCII altogether if we intend to make some kind of rational
scheme of it.

-- 
Paul Neubauer 	UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!neubauer

henry@utzoo.UUCP (Henry Spencer) (08/23/87)

> P.S.: About extra letters: is the "$"-sign really the writing in one space
> of "U" and "S"? So: "U.S. dollar" --> "$ dollar"

Close.  What I have been told is that the dollar sign is a scrunched form
of PS, with the loop of the P getting lost in the shuffle.  Why PS?  Because
the US took a long time to get its act together on a national currency, and
the Mexican peso saw considerable use meanwhile.
-- 
Apollo was the doorway to the stars. |  Henry Spencer @ U of Toronto Zoology
Next time, we should open it.        | {allegra,ihnp4,decvax,utai}!utzoo!henry

corelib@husc4.HARVARD.EDU (core library) (08/24/87)

In article <111@quick.UUCP> srg@quick.UUCP (Spencer Garrett) writes:
>
>I was told once (by a respected linguist, as I recall) that English and
>Russian are the ONLY two languages written with unaccented alphabets.  I
>know you have to add the qualifier "modern" to make that true, and maybe
>"major" as well, although I don't know of any exceptions right off.
> 				...
>This particular professor is extremely fond of Arabic script and has spent
>a great deal of time teaching TEX to handle it.)

Unlike Arabic, Modern Persian is written with no accents.  Accents are
only used in children's books and occasionally in print with a foreign
word.  This would perhaps be analogous to similar occurences in English.

In fact, a Persian alphabet would be easier to implement than an
English one (in terms of size) because there are 52 letters in the
Enlish language:  capitals and lower case.  These have to be specified
by the writer, while in Persian, the shape (or, loosely, case) of each
letter is absolutely determined by whether it has a connecting letter
or space before or after it.  

Since the Persian alphabet has only 32 letters, moreover, that would
leave at least 20 spaces for the 6-odd accents which are so
infrequently used.

Just in case Persian ever becomes that popular...
=======================================================================
Mark Towfigh             "Did you take that?"   "Soitanny not!"
                         "Oh, a wise guy, huh?" "Aaa-aaa-aaaaa-o-o-o-o"
UUCP:     harvard!husc4!corelib

kent@xanth.UUCP (Kent Paul Dolan) (08/24/87)

In article <1043@bsu-cs.UUCP> neubauer@bsu-cs.UUCP (Paul Neubauer) writes:
>We don't really need to worry about it [diacriticals in other languages]
>for C programs, since the characters needed for that are already known.

>Paul Neubauer 	UUCP:  {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!neubauer

Paul, your posting was fine except for this one lapse.  For some
number of years we have been getting away with assuming that English
is the universal programming language.  Since it is not proven that
programming has the same criticality as, say, piloting of ships and
airplanes, where such a rule makes sense, I expect programming in the
native tongue to become more widespread as programming becomes a more
worldwide activity.  In particular, for C, the use of "meaningful
identifiers" must imply "meaningful in the tongue of the reader".  We
can probably get away with english keywords, keywords have such
conventional meanings they are almost divorced from their common
English or other language meaning anyway, but I would expect to see
the alphabet in which C identifiers are written to vary to match the
needs of the language group using C.  This of course implies a lot of
new problems in code portability.  It is time to face these problems,
either by changing the de facto English predominance in coding to be
de jure, or else by providing for extended alphabet standards and
compliant compilers.

Gives a whole new meaning to the idea of program translators, too!

Kent, the man from xanth.

joe@haddock.ISC.COM (Joe Chapman) (08/24/87)

>I was told once (by a respected linguist, as I recall) that English and
>Russian are the ONLY two languages written with unaccented alphabets.

Well, the Russian letter pronounced "yo" is an e with a diaeresis, and
"yeri" looks like a backwards "N" with a breve.  You might argue that
these aren't really accented letters, by claiming:

	1. They aren't next to the unaccented version in the
	standard coallating sequence; in this case you have to
	accept Finnish (with a-diaeresis and o-diaeresis at the
	end of the alphabet) as being in the non-accented
	category.

	2. Absolutely no phonetic or grammatical correlation
	exists between the unaccented and accented versions of
	the characters; in this case you would be wrong.

	3. The characters have developed via evolution from
	utterly distinct earlier forms; from what I recall
	of Old Church Slavonic I don't believe this, but I'm
	open to such an argument.

Joe Chapman
harvard!ima!joe

pete@aiva.UUCP (08/25/87)

>I was told once (by a respected linguist, as I recall) that English and
>Russian are the ONLY two languages written with unaccented alphabets.  I
>know you have to add the qualifier "modern" to make that true, and maybe
>"major" as well, although I don't know of any exceptions right off.  I don't
>know whether he didn't count Katakana and Hiragana as alphabets or whether
>one cannot (or normally would not) write Japanese entirely in one or both
>of these scripts.  

One could consider katakana and hiragana as alphabets (functionally
equivalent to, in some sense), and one could write Japanese entirely 
in either one, probably introducing a lot of ambiguity. The point is
that they have accents - the two little strokes (called nigori) at 
the top right of certain characters indicate the voiced version of
that character, and without this indication, the Japanese would be
unreadable garbage.

Pete Whitelock
AI Edinburgh

karl@haddock.ISC.COM (Karl Heuer) (08/25/87)

In article <2242@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes:
>I would expect to see the alphabet in which C identifiers are written to vary
>to match the needs of the language group using C.

Probably.  Such programs cannot be strictly conforming, but if they are not
expected to be ported outside the locale of origin (this includes not only
foreign countries with nonstandard letters, but also VMS sites in the USA),
that's tolerable.  (And it's not difficult to write a transliterating program,
anyway.)  A compiler can admit such identifiers and still be conforming, so
the implementor has no good reason not to.

(Maybe someone needs to design a language along the lines of APL, with no
natural-language bias at all.)

Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint

bayes@hpfcrj.UUCP (08/26/87)

Back to Kanji:

As one who lived in Japan a number of years, and who has implemented
a Kanji utility on a computer, I have a couple of points to make with
some confidence:

o Japanese will not just "drop Kanji". When reading (and in an odd way, when
  speaking) Japanese, one thinks in Kanji, in some sense. Hiragana and
  Katakana can be used to represent the language, but they obscure, rather
  than illuminate the meaning. The fact that kana have been accepted in the
  past does not mean that kana are adequate or acceptable in a modern
  computing age.

  You might say:  "Well let them change from Kanji to
  Romaji/kana/whatever".  To which I might reply:  "The USA hasn't even
  gone to metric yet...".  Think what it means to change your whole way
  of representing your language and world.  Learning that 21C degrees is
  room temperature, or 2km is a 20 minute walk, is EASY by comparison.
  yet we're reluctant to make that concession to "modernity".

o There exists a standard for Japanese character representation digitally:
  JIS C 6226 (may have been updated). Any implementations had best superset
  that, as there's a lot of S/W and H/W around that now understands it.
  Shades of EBCIDC/360 compatibility.

o ALL the language must ultimately be representable, in some sense
  orthogonally. Saving a bit or 2 here and there, or forcing context-dependent
  decodings at the cost of performance and portability are false
  economies. So we need 32 bits. Big deal. Going to 21-1/2 bits or whatever
  has been suggested, or encoding only 90% of the language will just leave
  us with inadequate standards somewhere down the road (is this deja vu,
  or does my machine just not know how to accent the 'e' in deja :-)?

I cannot speak competently on Chinese, etc.

Scott Bayes
hpfcla!bayes

halo@cognos.uucp (Hal O'Connell) (08/31/87)

In article <8461@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes:
>> P.S.: About extra letters: is the "$"-sign really the writing in one space
>> of "U" and "S"? So: "U.S. dollar" --> "$ dollar"
>
>Close.  What I have been told is that the dollar sign is a scrunched form
>of PS, with the loop of the P getting lost in the shuffle.  Why PS?  Because
>the US took a long time to get its act together on a national currency, and
>the Mexican peso saw considerable use meanwhile.

My understanding is somewhat different, and takes into account the 
*difference* between the US dollar sign (normally an "S" with two 
vertical lines) and other dollar signs (the "S" with a single vertical
line).

The US dollar sign came from superimposing the S on a U, being an
abbreviation of "US"and erasing the base of the U. Very patriotic.

The other dollar sign comes from the historical common usage
of Spanish pieces of eight in international matters. The 
Superimposition of an 8 on a P was standard notation for this
currency. The erasing of a few lines gives the $ we all know.
I suspect that the symbolism of the peso evolved in the same fashion,
given its hispanic origins.

greg@gryphon.UUCP (09/02/87)

In article <481@kuling.UUCP> andersa@kuling.UUCP (Anders Andersson) writes:
>If the codes for different forms are the same, then the typography will
>have to depend on context, so there has to be three font bitmaps, or types
>on a printing wheel/chain, and a neat little algorithm instead of a 1-1
>mapping for translating character codes into font table indices. I'm not
>arguing against the solution, just pointing out some extra problems.

There are generally form forms as we shall see shortly.  The algorithm
is not neat at all.  If you are dealing with a crt display, displaying
English and Arabic, and implementing direct cursor positioning, the
placement of a single character will require on of eighteen distinct
display rearrangement algorithms.
>
>This leads me to another question: Which form is used in a Persian word
>consisting of only one letter?

The following is from my experience in implementing Arabic.  Persian is
similar.  

There are four forms to each letter.  The text is written from right-to-left
so the beginning form is on the right-most end.  The forms (cases) are
beginning (connected to the character on the left), middle (connected on
both sides), ending (connected to the character on the right) and alone
(not connected to anything).  There are 8 characters, as I recall, that
never connect to the left even when they appear in the middle of a word.
These letters need only two cases since the alone and ending case can
double for the beginning and middle cases.  

In our code set, the lower 128 cells were standard ASCII with upper and
lower case Latin characters.  The upper 128 codes were Arabic characters.
The letters occupied sticks 15 and 16.  Sticks 13 and 14 were diacritical
marks and the extender character.  The extender character was a letter
extender.  In Arabic writing, margin justification is accomplished by
by extending the intra-letter connections as opposed to adding whitespace
between letters or words so a letter extender was required.

Graphics like ! @ # $ % ^ etc., were represented in both the upper and lower
code tables.  When processing a stream of codes it was necessary to know
the language attribute of the special graphics.  For example (loosely),
">" in English meant "greater-than".  In Arabic it means "less than".
Actually, it has no meaning in Arabic but I was implementing a programming
language.

There is one ligature, lom-alif (forgive spelling ... I'm not literate in
Arabic, I only implemented 4 terminals, 6 printers, an operating system,
a programming language and a word processor and I still can't read, write
or speak the language) that is used when an alif immediately follows
a lom.  The lom-alif ligature uses one display cell rather than the two
cells that would be used by lom and alif displayed separately.

We handled diacritics by assigning a separate code to each mark. 
The effect was that the code stream was composed of variable length
display elements.  For example the code stream might be:
   letter
   letter diacritic
   letter diacritic extender [extender ...]
   lom alif
   lom alif diacritic
and various combinations of the above.  A more complete implementation would
also have allowed multiple diacritics following a letter.  

letter used 1 display cell.  lom followed by alif was counted as two letters
but used one display cell (lom-alif ligature).  Diacritics were displayed
in the same cell as the letter with which they were associated and thus
required no display space.  Extenders required 1 display cell but
did not count as a letter.  The ending form of some letters used 2
display cells.

To sort, letters were effectively expanding to 16 bits.  The letter code
was the upper 8 bits and the diacritic (0x0 if none) the lower 8 bits.
Search strings were similarly expanded.  Optionally a null diacritic
in a search string would match any diacritic in a target string.

I used 143 character generated graphics to represent all of the
Arabic letters, numerals and graphics unique to Arabic.  In addition there
was a standard English character generator.

There were a couple interesting problems that were never quite fully
resolved.  Since displayed letters were variable length (remember extenders),
the concept of column X was ambiguous; did we mean physical column X or
letter X?  

There is considerable disagreement in the Arabic speaking world as to
the format of an appropriate code set.  One code set has 5 or 6 cells
devoted to the lom-alif ligature with various diacritic marks.  There
is disagreement whether lom-alif should be a character by itself or
simply a ligature formed by the display system.  I believe this is
an offshoot or Arabic typewriters having a lom-alif key.
-- 
Greg Laskin   
"When everybody's talking and nobody's listening, how can we decide?"
INTERNET:     greg@gryphon.CTS.COM
UUCP:         {hplabs!hp-sdd, sdcsvax, ihnp4}!crash!gryphon!greg
UUCP:         {philabs, scgvaxd}!cadovax!gryphon!greg

alan@pdn.UUCP (Alan Lovejoy) (09/10/87)

In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
/The Russian alphabet does have two accented letters, although the accents
/are often omitted.  How common this omission is, or whether it is getting
/more common, I don't really know.

"I kratkoye" is usually written with a shallow bowl over it.  However,
it is considered a distinct letter from "i".  "E" can be pronounced
as either "ye" or "yo", and in books for foreigners a diaresis is placed
over it when it is to be pronounced "yo".  In handwriting (but not in
printed text), the letters "t" and "sh" sometimes have a line either
under or over them.  Other than that, no diacritical marks are used
*in Russian*.  However, there are other languages which use this 
alphabet, and they might have diacritics.

--alan@pdn

jeg@hector.UUCP (Judy Grass) (09/10/87)

In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>>I was told once (by a respected linguist, as I recall) that English and
>>>Russian are the ONLY two languages written with unaccented alphabets.
>
>The Russian alphabet does have two accented letters, although the accents
>are often omitted.  How common this omission is, or whether it is getting
>more common, I don't really know.
>-- 
>
>Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
>Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

I think the two letters you refer to are the "i kratkoe"  (short i) 
(looks like a backwards n with a sideways comma over it .. kinda) 
and the "jo"  (e with a diaresis).  The
i karkoe is ALWAYS written that way and is considered a letter in its own right.
No entrys for it in a dictionary, as it always follows another vowel.
The "jo" has a history, which I will skip.. In any case, it is printed
with the diaresis usually only in texts for students of Russian, or
for disambiguation.  
		-- Judy Grass (ex-Slavic Linguist)
		ATT Bell Labs, Murray Hill
		ulysses!jeg

jal@oliveb.UUCP (Tony Landells) (09/11/87)

In article <2351@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes:
> >>I was told once (by a respected linguist, as I recall) that English and
> >>Russian are the ONLY two languages written with unaccented alphabets.
> 
> The Russian alphabet does have two accented letters, although the accents
> are often omitted.  How common this omission is, or whether it is getting
> more common, I don't really know.

I think this depends how one defines an accent.  It is a long time since I
studied Russian, and thus I may have forgotten something, but as I recall,
it went like this:

Russian has two letters that could be considered accented, as they look the
same as other letters in the alphabet, save small "things" placed above them.

I would refrain from calling them accents, however, as the two letters
actually occupy places in the alphabet.  To my mind, an accent is something
you place over a normal letter of the alphabet to modify the sound or mark
stress in the word, or whatever, but that is not normally seen in the
alphabet.

If this is unclear, consider that the Greek alphabet has stress accents, but
they are never printed in the alphabet (though a character set would have to
have them).  French has accents to change the pronunciation of various
letters, but these aren't placed in the alphabet either.

Thus my premise is that the two apparently accented letters of Russian are,
in fact, separate letters in their own right.

Tony Landells.
-- 
"Holy olio, Batman!!"
"Why Boy Wonder; I didn't know you could l.any y

franka@mmintl.UUCP (Frank Adams) (09/15/87)

In article <2938@ulysses.homer.nj.att.com> jeg@hector (Judy Grass) writes:
>In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
>>[Somebody writes:]
>>>>I was told once (by a respected linguist, as I recall) that English and
>>>>Russian are the ONLY two languages written with unaccented alphabets.
>>
>>The Russian alphabet does have two accented letters,
>
>I think the two letters you refer to are the "i kratkoe"  (short i) 
>and the "jo"  (e with a diaresis).  The i karkoe is ALWAYS written 
>that way and is considered a letter in its own right.

Not ALWAYS.  I have seen it written without the mark.

>In any case, [jo] is printed with the diaresis usually only in texts for
>students of Russian, or for disambiguation.  

If it's used for disambiguation, it's used, or so it would seem to me.

I was taking a basically graphical approach to defining accented letters: if
two letters in an alphabet are the same, except that one has a mark on it
then that one is accented.  (This begs the question of how one distinguishes
"marks" from other parts of the letter.  A first approximation is that a
mark is a disconnected part of the letter -- but this doesn't deal with the
cedilla.)

There are two places to look for a more linguistic definition: alphabets, as
used by native speakers, and alphabetization rules.  When I studied Russian,
the "i kratkoe" was *not* included in the alphabet.  Whether it affects
alphabetization I don't know.

I would be quite surprized if English and Russian were the only languages
with no accented letters, when a letter is regarded as accented only when it
is alphabetized the same as the original form.

By the way, does Greek use accented letters?
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108

jc@minya.UUCP (John Chambers) (09/15/87)

> >>I was told once (by a respected linguist, as I recall) that English and
> >>Russian are the ONLY two languages written with unaccented alphabets.
> 
> The Russian alphabet does have two accented letters, although the accents
> are often omitted.  How common this omission is, or whether it is getting
> more common, I don't really know.

Actually, the i-kratkoe (looks like a backwards N with a tiny U above it)
always has the mark, which is considered a part of the letter.  The 'yo'
(looks like 'e' with an umlaut) normally has the mark omitted, except in
childrens' books, and occasionally when needed for clarity (like when you
need to distinguish 'vsye' from 'vsyo').

How about Welsh?  I don't seem to recall any marks on any letters there, 
though I don't claim familiarity with the language.  

There's also Serbo-Croation, which has a set of (5) marks, but you very
rarely see them outside of childrens' books and language texts.

For that matter, would you consider Yiddish and Hebrew?  True, there are
marks, but they are rarely used.  Even the mark that distinguishes 'shin'
from 'sin' is rarely used.  (Puns on the fact are welcome!)

Or were you perhaps talking only about Roman-derived alphabets?

-- 
	John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)

sjaak@vuecho.psy.vu.nl (Sjaak Schuurman) (09/16/87)

In article <141@minya.UUCP> jc@minya.UUCP (John Chambers) writes:
>
>There's also Serbo-Croation, which has a set of (5) marks, but you very
>rarely see them outside of childrens' books and language texts.

In this case, it is time to give some more details about the Serbo-Croation
language.
It is, as far as I know, one of the few (if not only) european languages which
can be written with two alphabets, which are completely equivalent. (i.e. it
doesn't make difference which of the two -latin or cyrillic- you use, rather
than that one is a sort of transcription of the other)
The latin alphabet is more used by the people in Croatia, whereas the cyrillic
version is more favourite in Serbia.
One of the great characteristics of this alphabet (which consists of 30 letters)
is the fact that it is completely phonetic, in the sense that every letter
stands exactly for one sound, and that every letter is always pronounced the
same way. 
The latin version uses some diacritics like a 'v' on top of a c, s or z, and
there exists a 'dj' which is written as a 'd' with a stroke through the 
vertical part. 
The cyrillic version however does not use any accents, diacritics or other
signs whatsoever, and Serbo-Croation can therefore be seen as one of the
languages with an 'accentless' alphabet.

If John Chamber refers with the set of 5 marks to the accent-like signs
which (in coursebooks etc.) show how a vowel should be stressed, I would
like to say that as far as I know, there are just 4 of such marks, but
as he said, they are never really used in the language itself, and do
not form a part of the alphabet.

	   v      v         v  v         v                
	"PISI KAO STO GOVORIS, CITAJ KAO STO JE NAPISANO"

       		                                  v /
					VUK  KARADZIC

						Sjaak Schuurman

minow@decvax.UUCP (Martin Minow) (09/16/87)

Swedish, Danish, Norwegien, and Finnish do not have accented letters.
All, however, have more vowels than the English alphabet can represent,
so diacritics are used.  Swedish occasionally uses accents to mark
stress on personal names, such as "Sund'en"

Korean uses a syllabic alphabet that lacks accented letters.

Martin Minow
decvax!minow

jeg@hector.UUCP (Judy Grass) (09/16/87)

In article <141@minya.UUCP> jc@minya.UUCP writes:
>There's also Serbo-Croation, which has a set of (5) marks, but you very
>rarely see them outside of childrens' books and language texts.
>

You can't talk about a Serbo-Croatian writing system.  Serbian is written in its
own form of Cyrillic (it shares a lot of letters with Russian Cyrillic, but
does not use all of them, and adds a few of its own).  Croatian is written in
a latin alphabet, with diacritics .. some that are absolutely required, 
and only a very few optionals.  To whit:  Croatian uses the hachek (an upside
down caret) over s, c, z for sh, ch, zh.  an acute accent is used over c
for a palatalized t (no such in english, sorry).  A line through a d
for a palatalized d.  (I may have forgotten one or two more such).  In addition,
some spoken dialects of Serbian and Croatian have tones.. lengthened vowel
with rising pitch, falling pitch, etc.  (somewhat like Chinese, but you
can learn to speak good Serbo-croatian without worrying much about it).
This stuff is sometimes written by using diacritics over vowels for the benefit
of foreign students.  Sometimes students only get word stress.

Serbo-croatian has yet a THIRD written version .. you may see Serbian texts
with Serbian pronunciations and Serbian vocabulary written in Croatian style
latin transcrition.  The issue of whether Serbo-Croatian is one language or
two is a hot one that has led to demonstrations in the streets of Zagreb,
various and sundry riots and a lot of linguistic engineering.

Re. Slavic writing systems:  The following languages use versions of Cyrillic,
in every case the writng systems are not identical:
	Russian, ByeloRussian, Ukrainian, Bulgarian, Macedonian, Serbian.
None of these use diacritics to any great extent.

The following languages use spellings based on the latin alphabet:
	Polish, Czech, Slovak, Slovenian, Croatian, Sorbian (a dying language).
All of these use plenty of obligatory diacritics, but again all have ideosyncracies.
Polish is the one in this group that stands out.  Polish spells the 
sounds "sh", "ch" using digraphs: sz, cz. But, Polish uses diacritics to indicate
nasalized vowels, long vowels, etc.  The others are similar to Croatian.
In Czech and Slovak you need to add accents for the vowels to indicate long
vowels (not stress as stress is fixed for these languages).

Cyrillic has also been improvised upon to invent writing systems for any of a number
of languages of various national groups within the USSR.  Moldavian for
is one example  (an attempt to re-ethnicize a group of Rumanians within the USSR). 
Various Eastern tribal languages have also gotten their first writing systems
this way.  I have also seen a Yiddish publication or two within the USSR that
was written in Cyrillic. (Now THIS is touchy).

		Judy Grass,  ATT Bell Labs, Murray Hill
		ulysses!jeg

eeyore@reed.UUCP (09/17/87)

In article <133@wundt.vuecho.psy.vu.nl> sjaak@psy.vu.nl (Sjaak Schuurman) writes:

>
>In this case, it is time to give some more details about the Serbo-Croation
>language.
>It is, as far as I know, one of the few (if not only) european languages which
>can be written with two alphabets, which are completely equivalent. (i.e. it

i'm pretty sure that romanian, which is written in the latin alphabet in 
romania, is written in the cyrillic alphabet in the moldavian s.s.r., which
would give it a status similiar to serbo-croatian.

question 1. does anybody know the cyrillic-moldavian equivalents of romanian?
question 2. does anybody know if finnish is written in cyrillic in the karelian
s.f.s.r.?

-- 
---------------------------------
"tout cela m'est egal" -meursault.
"it's all the same to me" -eeyore
 tektronix!reed!eeyore

firth@sei.cmu.edu (Robert Firth) (09/18/87)

A long time ago, someone wrote...
 >>I was told once (by a respected linguist, as I recall) that English and
 >>Russian are the ONLY two languages written with unaccented alphabets.

We've beaten Russian to death, but did anyone point out that English also
requires diacritical marks above, beside, and below letters?  They are
fast being dumped into the can~on of history due to the uncoo"perative
ro^le of ASCII terminals, but even a soupc,on of an acquaintance with
the real language should convince you they are not just mediaeval relics!

jc@minya.UUCP (jc) (09/20/87)

> I would be quite surprized if English and Russian were the only languages
> with no accented letters, when a letter is regarded as accented only when it
> is alphabetized the same as the original form.
> 
> By the way, does Greek use accented letters?

Indeed it does.  Proper spelling in Greek requires rather frequent accents
on vowels; it is rare to see more that 2 or 3 words go by without one.  Some
even have two:  Initial vowels may have either of two 'aspiration' marks that
are useless in modern Greek (one of which used to indicate an initial [h]),
in addition to the main accent.

There's also a cute historical quibble to the effect that English actually
has an accented letter: 'i'.  This is one of many letters that ultimately
derives from Greek, where the dot is in fact an accent on the iota, which
is properly dotless.  The really weird thing in English is that we use the
accent mark on the lower-case letter, but not on the upper-case 'I'.  This
is basically a result of historic ignorance, confusion, illiteracy, and so
on, in the development of the late Latin alphabet into a 2-case form.  The
Greek iota can have accents (or not) in either case.  Also, modern Turkish
uses both the dotted and undotted 'i', to make a phonetic distinction like
the vowels in 'beet' and 'bit'.  The Turkish capital, Istanbul, properly 
has a dot over the 'I', for instance, but it's hard to do in ASCII.

According to the above definition, since 'I' and 'i' are alphabetized the
same in English, I would conclude that English has an accented letter! (;-)

(Quick, someone point out the other accented letter in English.)

[Hmmm...Is 't' really an accented 'l'?  Now you're getting really weird!]

-- 
	John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)

jc@minya.UUCP (jc) (09/20/87)

> >There's also Serbo-Croation, which has a set of (5) marks, but you very
> >rarely see them outside of childrens' books and language texts.
> 
> The cyrillic version however does not use any accents, diacritics or other
> signs whatsoever, and Serbo-Croation can therefore be seen as one of the
> languages with an 'accentless' alphabet.

Actually, the Cyrillic has the same vowel marks, and they are as rarely used.
> 
> If John Chamber refers with the set of 5 marks to the accent-like signs
> which (in coursebooks etc.) show how a vowel should be stressed, I would
> like to say that as far as I know, there are just 4 of such marks, but
> as he said, they are never really used in the language itself, and do
> not form a part of the alphabet.

There's also a "long unstressed" symbol, a horizontal line above the vowel;
it appears rarely, too, mostly to distinguish definite from indefinite endings
(many of which differ only in length, and which can't always be inferred from
context).
> 
> 	   v      v         v  v         v                
> 	"PISI KAO STO GOVORIS, CITAJ KAO STO JE NAPISANO"
>                                                   v /
>                                         VUK  KARADZIC

This is one of the more impressive guidelines in modern writing systems.
The author was a major Jugoslav author earlier in the century, and he was
active in developing the modern two-alphabet writing system of Serbo-Croatian.
The quote means "Write like you speak; read as it is written."  In the context
of the multiple small groups at odd with each other, and no way of enforcing
a standard dialect on the population, it was a major political success that
the idea was accepted.  In other words, he was advising that people treat 
dialect differences with respect.  Everyone was to write phonetically in 
their own dialects; when reading, you should give the author the respect of
reading it as written.  An educated Jugoslav is expected to understand various
dialects to the point of understanding others' writing, though it may not be
as you would have written it yourself.

Now if the rest of the world could be taught such tolerance.  (Maybe it's
enough to hope that the Jugoslavs can keep such an idea alive in their own
small corner of the world.


-- 
	John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)

elwell@tut.cis.ohio-state.edu (Clayton Elwell) (09/21/87)

jc@minya.UUCP (jc) writes:
    There's also a cute historical quibble to the effect that English actually
    has an accented letter: 'i'.  This is one of many letters that ultimately
    derives from Greek, where the dot is in fact an accent on the iota, which
    is properly dotless.  The really weird thing in English is that we use the
    accent mark on the lower-case letter, but not on the upper-case 'I'.  This
    is basically a result of historic ignorance, confusion, illiteracy, and so
    on, in the development of the late Latin alphabet into a 2-case form.

    	John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)

ARF! That's not a historical quibble, it's a falsehood.  Let's take things
from the beginning, so that hopefully we can get back to international
standards ...

The Phonecians were the first people to develop an alphabet (i.e. a
writing system based on phonetics instead of pictographs).  Around 800
B.C., the Greeks decided that this was a good idea, and in 403 B.C.
they came up with an official alphabet.  Meanwhile, the Etruscans, who
were the first reasonably-sized civilization on the Italian peninsula,
had gotten their hands on an early version of the Greek alphabet [this
is where the term "beta test" comes from :-)], which they futzed
around with to suit their language.  In 700 B.C., they decided to
occupy Rome and invent urban renewal, thus giving the early Romans an
alphabet to play with.  They, in turn, fiddled with it and started
writing it with a pen, which changed the letter shapes some more.  The
Roman alphabet is usually thought to have reached its highest
aesthetic point in the inscription on the Trajan column.  This type of
letter, which is sometimes called "Roman Square Capital," or more
often just "Roman," is the direct source of modern upper case, and the
indirect source of modern lower case.

The biggest problem with roman capitals is that they are rather slow
to write, and so Roman scribes developed a variety of cursive forms,
some of which are completely unreadable to modern eyes, bearing as
they do a strong resemblance to Old Martian.  This caused some
problems, and by the 4th century a compact, informal script called
"Roman Rustic" or "Capitalis Rustica", a remarkably graceful alphabet.
This was still an all-capital script.

Well, the early Christians seemed to think that since this alphabet
had been used for (gasp) pagan writings (such as Virgil), it just
wouldn't do, and so they developed an official script, called "uncial"
by modern scholars.  It began as a majuscule script, but since it was
designed to be written quickly, many of the letterforms started to
resemble modern lower case.  This was where ascenders and descenders
made there first appearance, for example.  People finally agreed that
punctuation was a good idea, although they still ran all of the words
together.  Dots were placed above letters, but only to signify
corrections.  As the early Christian monasteries got richer, the
script got more ornate again, and word spacing was introduced, albeit
fitfully. Still no diacritic marks, except for corrections and abbreviations
(usually a horizontal stroke above two or three letters).

About this time (6th century or so), the Christians had decided to
convert England, which turned out to be more difficult than they had
hoped, but they kept working at it.  They brought with them an
informal script called "half-uncial", which contained even more
letterforms that eventually made it into modern lower case.  The
Irish, and to a lesser extent the Anglo-Saxons, took this script and
produced a script called "Insular Majuscule," one of the most
beautiful versions of the roman alphabet ever produced.  They also
produced a script called "Insular Miniscule" (suprise, suprise) that
was used at first for commentary and other 'unofficial' uses.  For the
next several centuries, things went downhill when it came to
legibilty, although punctuation was approaching its modern form.

At the end of the 8th century, a very important thing happened.
Charlemagne and his adviser Alcuin (a Benedictine monk) founded a
scriptorium at which developed a new script called "Carolingian
Miniscule", which, when revived during the Renaissance, was the direct
predecessor of the modern lower case alphabet.  Even so, diacritics
were still only used for abbreviation.  This is the origin of
"accents" in European languages.  For example, the circumflex in
French was originally an abbreviation for "s."

As this script slowly metamorphosed into Gothic, it became more
compact and less legible, since it had a more even texture.  The "i"
and "j" were still undotted, but "y" *was* dotted, so as to make it
stand out more.  By the fifteenth century, "i" sometimes had a dot as
well.  This custom continued into the Renaissance, at which time the
dot was dropped off of the "y" and added to the "j".  It was only with
the development of printing that it was standardized.

The use of accents and "breathing marks" is a feature of modern Greek,
not classical Greek.  It developed in parallel with accents in Europe,
not as a precursor to them.

Now that I've beaten this into the ground, can we get back to
international standards?  I'd be happy to continue via mail, but it
does seem to be off the subject for this newsgroup.

-- 
							      Clayton M. Elwell
       The Ohio State University Department of Computer and Information Science
       (614) 292-6546	 UUCP: ...!cbosgd!osu-cis!tut.cis.ohio-state.edu!elwell
		      ARPA: elwell@ohio-state.arpa (not working well right now)

hmj@tut.fi (Matti J{rvinen) (09/21/87)

In article <421@tutor.tut.UUCP> eal@tutor.UUCP (Lehtim{ki Erkki) writes:
>In article <7187@reed.UUCP> eeyore@reed.UUCP (joshua samuel honig guenter ii) writes:
>>question 2. does anybody know if finnish is written in cyrillic in the karelian
>>s.f.s.r.?
>
>They don't actually speak finnish in the Karelian s.f.s.r., i think they
>speak some of the karelian dialects, East-Karelian or Ruski-Karelian,
>i am not sure.

They use same alphabet than we here in Finland. The language is Finnish,
same words and grammar.  They speak Karelian dialect, but the written
text is the same than in Finland. I have two copies of newspaper
"Neuvosto-Karjala" (Soviet Karelia) printed in Petroskoi and the
language is the same, but they do have some odd soviet phrases :-)

Finnish is never written in cyrillic except some prayers' books written
around 1600 for Orthodox priests to make them Lutherian.

-- 
Hannu-Matti Jarvinen, Tampere University of Technology, Finland
Project EAST - European Advanced Software Technology
hmj@tut.fi, hmj@tut.uucp, hmj@tut.funet (tut.ARPA is not the same computer).

jsa@tut.fi (Jari Salo) (09/21/87)

in article <7187@reed.UUCP>, eeyore@reed.UUCP (joshua samuel honig guenter ii) says:
> Xref: tut comp.std.internat:168 sci.lang:1049
> question 2. does anybody know if finnish is written in cyrillic in the karelian
> s.f.s.r.?

	The finnish uses the normal latin alphabet with minor modifications.
	A_with_two_dots, O_with_two_dots and swedish O.

	To my knowledge finnish has NEVER been written in cyrillic.
	When under russian power official documents were written
	using cyrillic alphabet, but the language used was not finnish.

	Many people living in karelia s.f.s.r. speak both russian
	and finnish, but they HAVE NOT (and propably never will) mixed
	those two languages.

	Anyway, writing finnish using cyrillic alphabet would be just
	about as sane as to write english using katagana alphabet;
	you'd propably have to modify the original words a bit to make
	them fit to the new alphabet.
-- 
          Jari Salo              Tampere University of Technology 
UUCP:     jsa@tut.UUCP           Computer Systems Laboratory
Internet: jsa@tut.fi             PO box 527
Tel:      358-(9)31-162590       SF-33101 Tampere, Finland

zwicky@tut.cis.ohio-state.edu (Elizabeth Zwicky) (09/21/87)

In article <183@tut.cis.ohio-state.edu> elwell@tut.cis.ohio-state.edu (Clayton Elwell) writes:
>
>The use of accents and "breathing marks" is a feature of modern Greek,
>not classical Greek.  It developed in parallel with accents in Europe,
>not as a precursor to them.
>-- 
>							      Clayton M. Elwell


Well, that depends on what you call "modern". Call me a quibbler, but
accents and breathing marks are a feature of New Testament Greek, and
have been receding in usefulness since. Even you usually only extend
"modern" back a few hundred years... This would suggest that it was
indeed a precursor.
	Elizabeth

franka@mmintl.UUCP (Frank Adams) (09/22/87)

In article <2541@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes:
>We've beaten Russian to death, but did anyone point out that English also
>requires diacritical marks above, beside, and below letters?  They are
>fast being dumped into the can~on of history due to the uncoo"perative
>ro^le of ASCII terminals, but even a soupc,on of an acquaintance with
>the real language should convince you they are not just mediaeval relics!

It is my impression that these marks were very rarely used even before the
advent of ASCII terminals.  Further, of the four examples here, I would only
classify one (uncooperative) as genuinely English; the others are
unanglicized foreign words, complete with foreign marks.
-- 

Frank Adams                           ihnp4!philabs!pwa-b!mmintl!franka
Ashton-Tate          52 Oakland Ave North         E. Hartford, CT 06108