karl@haddock.ISC.COM (Karl Heuer) (01/01/70)
In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes: >Just by curiosity, a quick scan of my brain seems to indicate that English >would be the only European language not to use diacritical marks, digraphs, >or extra letters. (? - I mean something like the dutch "ij"). Well, it used to be common in English to put a diaresis over the third letter of "coordinate", for example; but that convention seems to be vanishing. I guess the closest thing in modern American English are the apostrophe and the hyphen -- and we might be better off without those, too. Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
biep@cs.vu.nl (J. A. "Biep" Durieux) (01/01/70)
In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes: >Just by curiosity, a quick scan of my brain seems to indicate that English >would be the only European language not to use diacritical marks, digraphs, >or extra letters. (? - I mean something like the Dutch "ij"). I do not completely understand what you mean by "extra letters", but the "ij" is just the long "i" (not the way any of you foreigners pronounce it :-)). The long "a" is written "aa", the long "e" "ee", "o" -> "oo", "u" -> "uu", and "i" -> "ii". Now "ii" looks ugly, so the second "i" is drawn as to curl under the first, and you have "ij" (Dutch typewriters have one key to print it - is that what you mean by "extra letters"? There having been a vowel shift in several parts of the Netherlands (some people still pronounce "ij" as in "see" (but short)), most people pronounce it what I think to be a unique way (I think there is no other language with this sound): something between the sounds in "red" "wine". The sound is clearly sliding (becoming a "y"-consonant, like "I"). Virtually the same sound (in most of the Netherlands) as "ei". But then, we have a lot of funny sounds in our language. Anybody knows a language where the sound of "ui" exists? (Somewhat higher than the shifted form of the vowel in "luck", and not to mix with the also present sound "eu", which is like the sound in the German "schoen") Kom kijken, er staat een kuiken in de keuken te koken! P.S.: About extra letters: is the "$"-sign really the writing in one space of "U" and "S"? So: "U.S. dollar" --> "$ dollar" -- Biep. (biep@cs.vu.nl via mcvax) Hot girls like cool boys
andersa@kuling.UUCP (Anders Andersson) (01/01/70)
In article <8708171253.AA21033@ephemeral.ai.toronto.edu> lamy@ai.toronto.edu (Jean-Francois Lamy) writes: >Just by curiosity, a quick scan of my brain seems to indicate that English >would be the only European language not to use diacritical marks, digraphs, >or extra letters. (? - I mean something like the dutch "ij"). As somebody has already mentioned, what's to be considered "diacritical" or "extra letters" depends on what language you compare with. I consider the Swedish circle above "a" to be no more "diacritical" than the common dot above a lowercase "i" or "j". At least the Turkish provide a uniform set of dotted and undotted "i" in both upper and lower case... What about the origin of "w", was it known by Caesar or is it a more modern invention? It seems pretty much like a double "v" ligature to me. -- Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden Phone: +46 18 183170 UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)
rob@pbhye.UUCP (Rob Bernardo) (01/01/70)
In article <480@kuling.UUCP> andersa@kuling.UUCP (Anders Andersson) writes:
Re: Russian
+ There are the hard and soft
+signs which have no phonetic value of their own, but they look like ordinary
+letters. I think it's funny that they are relevant in sorting, just as if the
+apostrophe should have it's own place in the English and French alphabets...
Actually, they are more analogous to the "h" in English when it combines
with "s", "t", etc. to denote a single sound (except that English "h" can
represent a sound by itself while the Russian hard and soft signs cannot).
(There are analogues in other languages, e.g. "h" in Portugues causing
a consonant to be palatal, silent "u" in Spanish and "h" in Italian  which
buffer "c" and "g" from a following "i" and "e".)
But while we're talking about alphabetic sorting, I should like to point
out that in Spanish "ch" is sorted after "c" and before "d". Similarly,
"ll" and "rr" immediately follow their unary counterparts. They are treated
as if single letters when reciting the alphabet, as is the non-ascii
single letter "n~".
+
+Anyway, it's true that accents don't play such a big role in Russian as they
+do in French or Czech. English is probably the least complicated in this
+matter, but I don't consider the English alphabet absolutely "pure" in some
+accentophobic sense. Every alphabet has its history of anomalies.
+
+Disclaimer: I'm NOT an educated linguist, just an amateur. Although I hold
+the above for true, any linguist could probably provide more detail.
+-- 
+Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden
+Phone: +46 18 183170
+UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)
-- 
I'm not a bug, I'm a feature.
Rob Bernardo, San Ramon, CA	(415) 823-2417	{pyramid|ihnp4|dual}!ptsfa!robapoorva@mind.UUCP (Apoorva Muralidhara) (01/01/70)
Posting-Front-End: GNU Emacs 18.40.2 of Thu Mar 19 1987 on mind (berkeley-unix)
Mark Towfigh writes:
*Unlike Arabic, Modern Persian is written with no accents.  Accents are
*only used in children's books and occasionally in print with a foreign
*word.  This would perhaps be analogous to similar occurences in English.
While I don't know Persian, I do know some Arabic.  Please explain
what you mean by "accents" when referring to Arabic.  From the
sentences quoted, it seems that you are referring to the signs in
vocalized text:  the vowels fathah, dammah, kasrah, their nunated
forms, and the no-vowel sign sukun.  They are, as far as I know, only
used in the Quran (because errors in vocalization might lead to
blasphemy), children's books, and books which teach Arabic to
non-native-speakers.  So I'm not sure what you mean by "unlike Arabic."
*Since the Persian alphabet has only 32 letters, moreover, that would
*leave at least 20 spaces for the 6-odd accents which are so
*infrequently used.
Arabic has 28 letters, or 29 if you count hamzah.  Well, okay, it has
31 if you count ta'-marbutah and alif maqsurah (I hope I haven't
forgotten anything), and 7 signs:  the three vowels, their nunated
versions, and sukun.  And Arabic doesn't have capital letters either.
Just in case Arabic ever becomes that popular . . .
				--Apoorva
[Now if you want an even neater-looking script, how about Kannada?
 A lot more letters though . . .]andersa@kuling.UUCP (01/01/70)
In article <2739@husc6.UUCP> corelib@husc4.HARVARD.EDU (core library) writes: >In fact, a Persian alphabet would be easier to implement than an >English one (in terms of size) because there are 52 letters in the >Enlish language: capitals and lower case. These have to be specified >by the writer, while in Persian, the shape (or, loosely, case) of each >letter is absolutely determined by whether it has a connecting letter >or space before or after it. There are three forms of each letter, designed for the beginning of a word, for the end of a word, and for within a word, right? I think the same is true for Arabic. This seems similar to Latin hand-writing, in that there are various ways to connect adjacent letters. I would like to know: Is it ever acceptable to write a Persian word using only one of the three forms (LIKE IT MAY BE ACCEPTABLE TO WRITE ENGLISH WORDS IN UPPERCASE ONLY), or are the form rules mandatory (in the sense any aspect of a language can be "mandatory")? If the codes for different forms are the same, then the typography will have to depend on context, so there has to be three font bitmaps, or types on a printing wheel/chain, and a neat little algorithm instead of a 1-1 mapping for translating character codes into font table indices. I'm not arguing against the solution, just pointing out some extra problems. This leads me to another question: Which form is used in a Persian word consisting of only one letter? -- Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden Phone: +46 18 183170 UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)
franka@mmintl.UUCP (Frank Adams) (01/01/70)
>>I was told once (by a respected linguist, as I recall) that English and >>Russian are the ONLY two languages written with unaccented alphabets. The Russian alphabet does have two accented letters, although the accents are often omitted. How common this omission is, or whether it is getting more common, I don't really know. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
eric@snark.UUCP (01/01/70)
So would *someone* who knows quit tantalizing us and *translate* 'gezellig'?
I'm curious about both the 'literal' (presumably etymology-based) translation
and whatever paragraph of circumlocutions is necessary to express the concept.
Inspection of various possible cognates (notably German ge + selig) suggests
a guess-translation of "wisdom-struck" to this amateur linguist; some sort of
metaphorical inversion (as with English "silly") seems not unlikely, yielding
something like "foolish" or the idiomatic "loopy". Now, how far off base am I?
I'll contribute an example from the days when I spoke reasonable Italian.
Er, make that "Tuscan"; I lived in Rome for two years but found out the
hard way when visiting Naples and Sicily that different 'dialects' of Italian
can be mutually utterly incomprehensible. "A language is a dialect with an
army" --and the Tuscan dialect of Rome and environs is the "national" language
of Italy.
When translating the italian term 'simpatico', you can choose the etymological
cognate "sympathetic" in English, or you can translate the actual concept --
but not both.
In English, "X is sympathetic to Y" != "Y is sympathetic to X"; the term
implies an unequal relationship in which one party accepts "sympathy" and
potential help from the other.
In Italian, "simpatico" is a fellow-feeling between equals. You say "those
two are simpatico", or "we are simpatico". "Those two are compatible" comes
close, but doesn't convey the mild but definite emotional warmth of the
Italian.
There is no really good translation in English, which is why some English
speakers have naturalized 'simpatico' in its Italian form.
-- 
      Eric S. Raymond
      UUCP:  {{seismo,ihnp4,rutgers}!cbmvax,sdcrdcf!burdvax,vu-vlsi}!snark!eric
      Post:  22 South Warren Avenue, Malvern, PA 19355    Phone: (215)-296-5718eal@tut.UUCP (Lehtim{ki Erkki) (01/01/70)
In article <7187@reed.UUCP> eeyore@reed.UUCP (joshua samuel honig guenter ii) writes: >question 2. does anybody know if finnish is written in cyrillic in the karelian >s.f.s.r.? They don't actually speak finnish in the Karelian s.f.s.r., i think they speak some of the karelian dialects, East-Karelian or Ruski-Karelian, i am not sure. But if you want, i can check it. But use e-mail, i seldom read this group. -- Erkki A. Lehtim{ki eal@tut.uucp
gordan@maccs.UUCP (Gordan Palameta) (08/17/87)
In article <479@sugar.UUCP> peter@sugar.UUCP (Peter da Silva) writes: >In Japan programming languages are the least of the problems their written >language causes them. An incredible amount of data is never stored anywhere >but on the original form, photocopies of said form, or faxed copies of said >form. Even with the best tools available it's just too hard to keypunch. Everything you mention above was true of English, less than ten years ago. It's easy to forget just how recent a development the personal computer is, without which "office automation" would be far less practical. >This, of course, makes it even more amazing that they have been so succesful >in the world community. It seems likely to me, though, that at some point >they're going to have to break down and drop Kanji for professional use. Not very likely. The whole point of using computers is that they should be adapted to serve us, not the other way around. French and German did not adapt to computers by dropping cedillas, umlauts, and accents; instead we now have ISO Latin. Arabic has not adapted to computers by simplifying the calligraphic qualities of its script; instead sophisticated software is being used to properly display Arabic without sacrificing its aesthetic qualities (see the latest issue of Communications of the ACM). And Japanese will one day be fully accomodated by computers; enormous progress has already been made towards this end. Japanese without kanji would not be Japanese. To look at issues such as this from a different perspective, consider English. English uses a highly non-phonetic script; the illiteracy rate in the U.S. is at alarming levels. This might be a non-sequitur, but undoubtedly a phonetic script for English would make life a lot simpler. Will it ever happen? Not a chance. I'm not sure if this discussion belongs in comp.std.internat, so I cross-posted to sci.lang and removed comp.lang.c. Follow-up wherever you see fit. -- UUCP: ... !mnetor!lsuc!maccs!gordan BITNET: GP@TANDEM "Sumasshedshii vsekh stran, soyedinyaites'" Gordan Palameta
lamy@utegc.UUCP (08/17/87)
In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes: >French and German did not adapt to computers by dropping cedillas, umlauts, >and accents; instead we now have ISO Latin. Arabic has not adapted to I beg to differ. I have a cedilla in my name, and I can tell you that I have not seen it appear very often in computer output in the last 25 years. And that (was) in a part of the world that tries to be officially French. I used to use a CDC Cyber with a "shell" custom built to handle French. The system evolved over about 10 years, but that was only possible because the machine was so crippled in the first place (6 bit chars, no text processing utilities) that everything was made from scratch (even the Pascal compiler accepted accented identifiers). I find it somewhat ironic that the recently version of "ngrep" moves toward internationalization by trying to accomodate Japanese. I am quite confident that we will see a Japanese version of Unix before a French or a German one. Just by curiosity, a quick scan of my brain seems to indicate that English would be the only European language not to use diacritical marks, digraphs, or extra letters. (? - I mean something like the dutch "ij"). Jean-Francois Lamy lamy@ai.toronto.edu (CSnet,UUCP,Bitnet) AI Group, Dept of Computer Science lamy@ai.toronto.cdn (EAN X.400) University of Toronto, Canada M5S 1A4 {seismo,watmath}!ai.toronto.edu!lamy
roy@phri.UUCP (Roy Smith) (08/17/87)
In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes: > English uses a highly non-phonetic script; the illiteracy rate in the > U.S. is at alarming levels. This might be a non-sequitur, but > undoubtedly a phonetic script for English would make life a lot simpler. > Will it ever happen? Not a chance. Well, maybe a small chance. When I was in, I think, the first grade (in New York City, about 1965) they tried out an experimental reading and writing system on us. We were taught a phonetic alphabet. All I really really remember about it was that letters like "c" which admitted to two pronunciation were banned -- you wrote "kalsify" instead of "calcify", and that they added a schwa to the alphabet. Schwa, written like an upside-down "e" was some sort of vowel. Dictionaries use it a lot to show pronunciation. My mother is convinced that my poor spelling skills are a direct effect of this phonetic writing experiment. She's probably right. -- Roy Smith, {allegra,cmcl2,philabs}!phri!roy System Administrator, Public Health Research Institute 455 First Avenue, New York, NY 10016
Isaac_K_Rabinovitch@cup.portal.com (08/18/87)
Before we give the Japanese too much credit for becoming an advanced technical society in spite of the limitations of Kanji, we should remember one way in which we have fallen behind: language. In Japan, as in most countries, scientists, doctors, and engineers are required to learn the language that is widely used for their discipline. As in Europe, most Japanese physicians speak German, most computer scientists speak English, etc. The U.S. has lucked out, despite appalling poor language instruction, simply because English happens to be a standard technical language. We don't seem to have made best use of this advantage.
jal@oliveb.UUCP (Tony Landells) (08/19/87)
In article <8708171253.AA21033@ephemeral.ai.toronto.edu>, lamy@ai.toronto.edu (Jean-Francois Lamy) writes: > > I find it somewhat ironic that the recently version of "ngrep" moves toward > internationalization by trying to accomodate Japanese. I am quite confident > that we will see a Japanese version of Unix before a French or a German one. > I'm afraid I must beg to differ - AT&T currently have an internationalized UNIX in beta-test (or, at least, the applications environment is in beta- test - I assume it would make a completely nationalized system possible) with French and German as the two possibilities currently available in Europe. Of course, you do need the hardware to support it (e.g. European ISO 8859/1 8-bit graphical character set conforment). I'm not sure how well it works yet, as it only arrived last week, and we're currently in the process of installing it, but it does exist... Tony Landells. -- I don't have a .signature, but then I never did get the hang of .writing ...
alan@pdn.UUCP (Alan Lovejoy) (08/19/87)
In article <717@maccs.UUCP> gordan@maccs.UUCP (Gordan Palameta) writes: >"Sumasshedshii vsekh stran, soyedinyaites'" Gordan Palameta Shouldn't that be: Sumashedshchiji fsjekh stran, soyedjinjaitjesj ((Let) loonies (out-of-mind-goners) (of) all countries unite (make themselves (as) one))? First of all, they already have. Secondly, ASCII is admittedly far underpowered to even be considered as a standard for all languages world-wide, even if they all could socio- politically agree to use the "same" alphabet. Thirdly, the Soviets already have RuSCII (Russian Standard Code For Information Interchange). And lastly, it's a lot easier to write "English" in Cyrillic than it is to write "Russian" in Anglographia (excuse me for coining a new term). --Alan "Pjervyj bljin komom" Lovejoy
guy%gorodish@Sun.COM (Guy Harris) (08/19/87)
> > I find it somewhat ironic that the recently version of "ngrep" moves toward > > internationalization by trying to accomodate Japanese. I am quite confident > > that we will see a Japanese version of Unix before a French or a German one. > > > I'm afraid I must beg to differ - AT&T currently have an internationalized > UNIX in beta-test (or, at least, the applications environment is in beta- > test - I assume it would make a completely nationalized system possible) with > French and German as the two possibilities currently available in Europe. However, I believe they released the Japanese Applications Environment before any European environments, so his contention still stands. Guy Harris {ihnp4, decvax, seismo, decwrl, ...}!sun!guy guy@sun.com
srg@quick.UUCP (Spencer Garrett) (08/20/87)
I was told once (by a respected linguist, as I recall) that English and Russian are the ONLY two languages written with unaccented alphabets. I know you have to add the qualifier "modern" to make that true, and maybe "major" as well, although I don't know of any exceptions right off. I don't know whether he didn't count Katakana and Hiragana as alphabets or whether one cannot (or normally would not) write Japanese entirely in one or both of these scripts. He seemed to think that an unaccented alphabet was a substantial advantage in an information age, and I would tend to agree. (It should be noted in passing that this is not mere cultural imperialism. This particular professor is extremely fond of Arabic script and has spent a great deal of time teaching TEX to handle it.)
andersa@kuling.UUCP (Anders Andersson) (08/23/87)
In article <111@quick.UUCP> srg@quick.UUCP (Spencer Garrett) writes: >I was told once (by a respected linguist, as I recall) that English and >Russian are the ONLY two languages written with unaccented alphabets. I That depends on the definition of "accented". A Russian "J" is simply an "I" (a reversed "N") with a kind of U-arc on top of it, and they are sorted the same. "E" has an umlaut-like accent which changes the vowel from "ye" to "yo", but this seldom shows up in print (except for dictionaries, which also use common acute accent to show pronounciation). There are the hard and soft signs which have no phonetic value of their own, but they look like ordinary letters. I think it's funny that they are relevant in sorting, just as if the apostrophe should have it's own place in the English and French alphabets... Anyway, it's true that accents don't play such a big role in Russian as they do in French or Czech. English is probably the least complicated in this matter, but I don't consider the English alphabet absolutely "pure" in some accentophobic sense. Every alphabet has its history of anomalies. Disclaimer: I'm NOT an educated linguist, just an amateur. Although I hold the above for true, any linguist could probably provide more detail. -- Anders Andersson, Dept. of Computer Systems, Uppsala University, Sweden Phone: +46 18 183170 UUCP: andersa@kuling.UUCP (...!{seismo,mcvax}!enea!kuling!andersa)
neubauer@bsu-cs.UUCP (08/23/87)
In article <111@quick.UUCP>, srg@quick.UUCP (Spencer Garrett) writes: > I was told once (by a respected linguist, as I recall) that English and > Russian are the ONLY two languages written with unaccented alphabets. I > know you have to add the qualifier "modern" to make that true, and maybe > "major" as well, although I don't know of any exceptions right off. I don't ^^^^^ that might do it > know whether he didn't count Katakana and Hiragana as alphabets or whether They are not alphabets, they are syllabaries, i.e., each symbol represents a whole syllable. > one cannot (or normally would not) write Japanese entirely in one or both > of these scripts. He seemed to think that an unaccented alphabet was a > substantial advantage in an information age, and I would tend to agree. So would I. In article <2842@ulysses.homer.nj.att.com>, jss@hector..UUCP (Jerry Schwarz) writes: > I quote from a draft of the Rationale of the proposed > ANSI C standard, section 4.4: > The English language uses 26 letters derived from the > Latin alphabet. The set of letters suffices for English, > Swahili, and Hawaiian; all other living languages use > either the Latin aphabet plus other characters, or other > non Latin aphabets or syllabaries. > They cite no reference for this piece of trivia. Just as well, since it is not true. Another counterexample (from off the top of my head): Hmong. If necessary, we could undoubtedly come up with more, but there is really no point. We don't really need to worry about it for C programs, since the characters needed for that are already known. What we do need to worry about is how to set up computer facilities, e.g., keyboards, and how to represent the modified letters in languages that DO have diacritics. It has already been established that simply using the high bit of an 8-bit byte for +/- modified will not do, both because of multiple diacritics for a single letter in a given language, and also because of multi-lingual text. It is certainly far less elegant to simply assign a byte from the upper 1/2 of the byte range (i.e. with high bit set) to each known modified letter. If we stick to the Latin alphabet, though, there are probably enough unassigned bytes to do it. That will leave very odd sets of bit patterns to represent the letters of a given language, but the alternative would appear to be to scrap ASCII altogether if we intend to make some kind of rational scheme of it. -- Paul Neubauer UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!neubauer
henry@utzoo.UUCP (Henry Spencer) (08/23/87)
> P.S.: About extra letters: is the "$"-sign really the writing in one space > of "U" and "S"? So: "U.S. dollar" --> "$ dollar" Close. What I have been told is that the dollar sign is a scrunched form of PS, with the loop of the P getting lost in the shuffle. Why PS? Because the US took a long time to get its act together on a national currency, and the Mexican peso saw considerable use meanwhile. -- Apollo was the doorway to the stars. | Henry Spencer @ U of Toronto Zoology Next time, we should open it. | {allegra,ihnp4,decvax,utai}!utzoo!henry
corelib@husc4.HARVARD.EDU (core library) (08/24/87)
In article <111@quick.UUCP> srg@quick.UUCP (Spencer Garrett) writes: > >I was told once (by a respected linguist, as I recall) that English and >Russian are the ONLY two languages written with unaccented alphabets. I >know you have to add the qualifier "modern" to make that true, and maybe >"major" as well, although I don't know of any exceptions right off. > ... >This particular professor is extremely fond of Arabic script and has spent >a great deal of time teaching TEX to handle it.) Unlike Arabic, Modern Persian is written with no accents. Accents are only used in children's books and occasionally in print with a foreign word. This would perhaps be analogous to similar occurences in English. In fact, a Persian alphabet would be easier to implement than an English one (in terms of size) because there are 52 letters in the Enlish language: capitals and lower case. These have to be specified by the writer, while in Persian, the shape (or, loosely, case) of each letter is absolutely determined by whether it has a connecting letter or space before or after it. Since the Persian alphabet has only 32 letters, moreover, that would leave at least 20 spaces for the 6-odd accents which are so infrequently used. Just in case Persian ever becomes that popular... ======================================================================= Mark Towfigh "Did you take that?" "Soitanny not!" "Oh, a wise guy, huh?" "Aaa-aaa-aaaaa-o-o-o-o" UUCP: harvard!husc4!corelib
kent@xanth.UUCP (Kent Paul Dolan) (08/24/87)
In article <1043@bsu-cs.UUCP> neubauer@bsu-cs.UUCP (Paul Neubauer) writes: >We don't really need to worry about it [diacriticals in other languages] >for C programs, since the characters needed for that are already known. >Paul Neubauer UUCP: {ihnp4,seismo}!{iuvax,pur-ee}!bsu-cs!neubauer Paul, your posting was fine except for this one lapse. For some number of years we have been getting away with assuming that English is the universal programming language. Since it is not proven that programming has the same criticality as, say, piloting of ships and airplanes, where such a rule makes sense, I expect programming in the native tongue to become more widespread as programming becomes a more worldwide activity. In particular, for C, the use of "meaningful identifiers" must imply "meaningful in the tongue of the reader". We can probably get away with english keywords, keywords have such conventional meanings they are almost divorced from their common English or other language meaning anyway, but I would expect to see the alphabet in which C identifiers are written to vary to match the needs of the language group using C. This of course implies a lot of new problems in code portability. It is time to face these problems, either by changing the de facto English predominance in coding to be de jure, or else by providing for extended alphabet standards and compliant compilers. Gives a whole new meaning to the idea of program translators, too! Kent, the man from xanth.
joe@haddock.ISC.COM (Joe Chapman) (08/24/87)
>I was told once (by a respected linguist, as I recall) that English and >Russian are the ONLY two languages written with unaccented alphabets. Well, the Russian letter pronounced "yo" is an e with a diaeresis, and "yeri" looks like a backwards "N" with a breve. You might argue that these aren't really accented letters, by claiming: 1. They aren't next to the unaccented version in the standard coallating sequence; in this case you have to accept Finnish (with a-diaeresis and o-diaeresis at the end of the alphabet) as being in the non-accented category. 2. Absolutely no phonetic or grammatical correlation exists between the unaccented and accented versions of the characters; in this case you would be wrong. 3. The characters have developed via evolution from utterly distinct earlier forms; from what I recall of Old Church Slavonic I don't believe this, but I'm open to such an argument. Joe Chapman harvard!ima!joe
pete@aiva.UUCP (08/25/87)
>I was told once (by a respected linguist, as I recall) that English and >Russian are the ONLY two languages written with unaccented alphabets. I >know you have to add the qualifier "modern" to make that true, and maybe >"major" as well, although I don't know of any exceptions right off. I don't >know whether he didn't count Katakana and Hiragana as alphabets or whether >one cannot (or normally would not) write Japanese entirely in one or both >of these scripts. One could consider katakana and hiragana as alphabets (functionally equivalent to, in some sense), and one could write Japanese entirely in either one, probably introducing a lot of ambiguity. The point is that they have accents - the two little strokes (called nigori) at the top right of certain characters indicate the voiced version of that character, and without this indication, the Japanese would be unreadable garbage. Pete Whitelock AI Edinburgh
karl@haddock.ISC.COM (Karl Heuer) (08/25/87)
In article <2242@xanth.UUCP> kent@xanth.UUCP (Kent Paul Dolan) writes: >I would expect to see the alphabet in which C identifiers are written to vary >to match the needs of the language group using C. Probably. Such programs cannot be strictly conforming, but if they are not expected to be ported outside the locale of origin (this includes not only foreign countries with nonstandard letters, but also VMS sites in the USA), that's tolerable. (And it's not difficult to write a transliterating program, anyway.) A compiler can admit such identifiers and still be conforming, so the implementor has no good reason not to. (Maybe someone needs to design a language along the lines of APL, with no natural-language bias at all.) Karl W. Z. Heuer (ima!haddock!karl or karl@haddock.isc.com), The Walking Lint
bayes@hpfcrj.UUCP (08/26/87)
Back to Kanji: As one who lived in Japan a number of years, and who has implemented a Kanji utility on a computer, I have a couple of points to make with some confidence: o Japanese will not just "drop Kanji". When reading (and in an odd way, when speaking) Japanese, one thinks in Kanji, in some sense. Hiragana and Katakana can be used to represent the language, but they obscure, rather than illuminate the meaning. The fact that kana have been accepted in the past does not mean that kana are adequate or acceptable in a modern computing age. You might say: "Well let them change from Kanji to Romaji/kana/whatever". To which I might reply: "The USA hasn't even gone to metric yet...". Think what it means to change your whole way of representing your language and world. Learning that 21C degrees is room temperature, or 2km is a 20 minute walk, is EASY by comparison. yet we're reluctant to make that concession to "modernity". o There exists a standard for Japanese character representation digitally: JIS C 6226 (may have been updated). Any implementations had best superset that, as there's a lot of S/W and H/W around that now understands it. Shades of EBCIDC/360 compatibility. o ALL the language must ultimately be representable, in some sense orthogonally. Saving a bit or 2 here and there, or forcing context-dependent decodings at the cost of performance and portability are false economies. So we need 32 bits. Big deal. Going to 21-1/2 bits or whatever has been suggested, or encoding only 90% of the language will just leave us with inadequate standards somewhere down the road (is this deja vu, or does my machine just not know how to accent the 'e' in deja :-)? I cannot speak competently on Chinese, etc. Scott Bayes hpfcla!bayes
halo@cognos.uucp (Hal O'Connell) (08/31/87)
In article <8461@utzoo.UUCP> henry@utzoo.UUCP (Henry Spencer) writes: >> P.S.: About extra letters: is the "$"-sign really the writing in one space >> of "U" and "S"? So: "U.S. dollar" --> "$ dollar" > >Close. What I have been told is that the dollar sign is a scrunched form >of PS, with the loop of the P getting lost in the shuffle. Why PS? Because >the US took a long time to get its act together on a national currency, and >the Mexican peso saw considerable use meanwhile. My understanding is somewhat different, and takes into account the *difference* between the US dollar sign (normally an "S" with two vertical lines) and other dollar signs (the "S" with a single vertical line). The US dollar sign came from superimposing the S on a U, being an abbreviation of "US"and erasing the base of the U. Very patriotic. The other dollar sign comes from the historical common usage of Spanish pieces of eight in international matters. The Superimposition of an 8 on a P was standard notation for this currency. The erasing of a few lines gives the $ we all know. I suspect that the symbolism of the peso evolved in the same fashion, given its hispanic origins.
greg@gryphon.UUCP (09/02/87)
In article <481@kuling.UUCP> andersa@kuling.UUCP (Anders Andersson) writes: >If the codes for different forms are the same, then the typography will >have to depend on context, so there has to be three font bitmaps, or types >on a printing wheel/chain, and a neat little algorithm instead of a 1-1 >mapping for translating character codes into font table indices. I'm not >arguing against the solution, just pointing out some extra problems. There are generally form forms as we shall see shortly. The algorithm is not neat at all. If you are dealing with a crt display, displaying English and Arabic, and implementing direct cursor positioning, the placement of a single character will require on of eighteen distinct display rearrangement algorithms. > >This leads me to another question: Which form is used in a Persian word >consisting of only one letter? The following is from my experience in implementing Arabic. Persian is similar. There are four forms to each letter. The text is written from right-to-left so the beginning form is on the right-most end. The forms (cases) are beginning (connected to the character on the left), middle (connected on both sides), ending (connected to the character on the right) and alone (not connected to anything). There are 8 characters, as I recall, that never connect to the left even when they appear in the middle of a word. These letters need only two cases since the alone and ending case can double for the beginning and middle cases. In our code set, the lower 128 cells were standard ASCII with upper and lower case Latin characters. The upper 128 codes were Arabic characters. The letters occupied sticks 15 and 16. Sticks 13 and 14 were diacritical marks and the extender character. The extender character was a letter extender. In Arabic writing, margin justification is accomplished by by extending the intra-letter connections as opposed to adding whitespace between letters or words so a letter extender was required. Graphics like ! @ # $ % ^ etc., were represented in both the upper and lower code tables. When processing a stream of codes it was necessary to know the language attribute of the special graphics. For example (loosely), ">" in English meant "greater-than". In Arabic it means "less than". Actually, it has no meaning in Arabic but I was implementing a programming language. There is one ligature, lom-alif (forgive spelling ... I'm not literate in Arabic, I only implemented 4 terminals, 6 printers, an operating system, a programming language and a word processor and I still can't read, write or speak the language) that is used when an alif immediately follows a lom. The lom-alif ligature uses one display cell rather than the two cells that would be used by lom and alif displayed separately. We handled diacritics by assigning a separate code to each mark. The effect was that the code stream was composed of variable length display elements. For example the code stream might be: letter letter diacritic letter diacritic extender [extender ...] lom alif lom alif diacritic and various combinations of the above. A more complete implementation would also have allowed multiple diacritics following a letter. letter used 1 display cell. lom followed by alif was counted as two letters but used one display cell (lom-alif ligature). Diacritics were displayed in the same cell as the letter with which they were associated and thus required no display space. Extenders required 1 display cell but did not count as a letter. The ending form of some letters used 2 display cells. To sort, letters were effectively expanding to 16 bits. The letter code was the upper 8 bits and the diacritic (0x0 if none) the lower 8 bits. Search strings were similarly expanded. Optionally a null diacritic in a search string would match any diacritic in a target string. I used 143 character generated graphics to represent all of the Arabic letters, numerals and graphics unique to Arabic. In addition there was a standard English character generator. There were a couple interesting problems that were never quite fully resolved. Since displayed letters were variable length (remember extenders), the concept of column X was ambiguous; did we mean physical column X or letter X? There is considerable disagreement in the Arabic speaking world as to the format of an appropriate code set. One code set has 5 or 6 cells devoted to the lom-alif ligature with various diacritic marks. There is disagreement whether lom-alif should be a character by itself or simply a ligature formed by the display system. I believe this is an offshoot or Arabic typewriters having a lom-alif key. -- Greg Laskin "When everybody's talking and nobody's listening, how can we decide?" INTERNET: greg@gryphon.CTS.COM UUCP: {hplabs!hp-sdd, sdcsvax, ihnp4}!crash!gryphon!greg UUCP: {philabs, scgvaxd}!cadovax!gryphon!greg
alan@pdn.UUCP (Alan Lovejoy) (09/10/87)
In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes:
/The Russian alphabet does have two accented letters, although the accents
/are often omitted.  How common this omission is, or whether it is getting
/more common, I don't really know.
"I kratkoye" is usually written with a shallow bowl over it.  However,
it is considered a distinct letter from "i".  "E" can be pronounced
as either "ye" or "yo", and in books for foreigners a diaresis is placed
over it when it is to be pronounced "yo".  In handwriting (but not in
printed text), the letters "t" and "sh" sometimes have a line either
under or over them.  Other than that, no diacritical marks are used
*in Russian*.  However, there are other languages which use this 
alphabet, and they might have diacritics.
--alan@pdnjeg@hector.UUCP (Judy Grass) (09/10/87)
In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >>>I was told once (by a respected linguist, as I recall) that English and >>>Russian are the ONLY two languages written with unaccented alphabets. > >The Russian alphabet does have two accented letters, although the accents >are often omitted. How common this omission is, or whether it is getting >more common, I don't really know. >-- > >Frank Adams ihnp4!philabs!pwa-b!mmintl!franka >Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108 I think the two letters you refer to are the "i kratkoe" (short i) (looks like a backwards n with a sideways comma over it .. kinda) and the "jo" (e with a diaresis). The i karkoe is ALWAYS written that way and is considered a letter in its own right. No entrys for it in a dictionary, as it always follows another vowel. The "jo" has a history, which I will skip.. In any case, it is printed with the diaresis usually only in texts for students of Russian, or for disambiguation. -- Judy Grass (ex-Slavic Linguist) ATT Bell Labs, Murray Hill ulysses!jeg
jal@oliveb.UUCP (Tony Landells) (09/11/87)
In article <2351@mmintl.UUCP>, franka@mmintl.UUCP (Frank Adams) writes: > >>I was told once (by a respected linguist, as I recall) that English and > >>Russian are the ONLY two languages written with unaccented alphabets. > > The Russian alphabet does have two accented letters, although the accents > are often omitted. How common this omission is, or whether it is getting > more common, I don't really know. I think this depends how one defines an accent. It is a long time since I studied Russian, and thus I may have forgotten something, but as I recall, it went like this: Russian has two letters that could be considered accented, as they look the same as other letters in the alphabet, save small "things" placed above them. I would refrain from calling them accents, however, as the two letters actually occupy places in the alphabet. To my mind, an accent is something you place over a normal letter of the alphabet to modify the sound or mark stress in the word, or whatever, but that is not normally seen in the alphabet. If this is unclear, consider that the Greek alphabet has stress accents, but they are never printed in the alphabet (though a character set would have to have them). French has accents to change the pronunciation of various letters, but these aren't placed in the alphabet either. Thus my premise is that the two apparently accented letters of Russian are, in fact, separate letters in their own right. Tony Landells. -- "Holy olio, Batman!!" "Why Boy Wonder; I didn't know you could l.any y
franka@mmintl.UUCP (Frank Adams) (09/15/87)
In article <2938@ulysses.homer.nj.att.com> jeg@hector (Judy Grass) writes: >In article <2351@mmintl.UUCP> franka@mmintl.UUCP (Frank Adams) writes: >>[Somebody writes:] >>>>I was told once (by a respected linguist, as I recall) that English and >>>>Russian are the ONLY two languages written with unaccented alphabets. >> >>The Russian alphabet does have two accented letters, > >I think the two letters you refer to are the "i kratkoe" (short i) >and the "jo" (e with a diaresis). The i karkoe is ALWAYS written >that way and is considered a letter in its own right. Not ALWAYS. I have seen it written without the mark. >In any case, [jo] is printed with the diaresis usually only in texts for >students of Russian, or for disambiguation. If it's used for disambiguation, it's used, or so it would seem to me. I was taking a basically graphical approach to defining accented letters: if two letters in an alphabet are the same, except that one has a mark on it then that one is accented. (This begs the question of how one distinguishes "marks" from other parts of the letter. A first approximation is that a mark is a disconnected part of the letter -- but this doesn't deal with the cedilla.) There are two places to look for a more linguistic definition: alphabets, as used by native speakers, and alphabetization rules. When I studied Russian, the "i kratkoe" was *not* included in the alphabet. Whether it affects alphabetization I don't know. I would be quite surprized if English and Russian were the only languages with no accented letters, when a letter is regarded as accented only when it is alphabetized the same as the original form. By the way, does Greek use accented letters? -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108
jc@minya.UUCP (John Chambers) (09/15/87)
> >>I was told once (by a respected linguist, as I recall) that English and > >>Russian are the ONLY two languages written with unaccented alphabets. > > The Russian alphabet does have two accented letters, although the accents > are often omitted. How common this omission is, or whether it is getting > more common, I don't really know. Actually, the i-kratkoe (looks like a backwards N with a tiny U above it) always has the mark, which is considered a part of the letter. The 'yo' (looks like 'e' with an umlaut) normally has the mark omitted, except in childrens' books, and occasionally when needed for clarity (like when you need to distinguish 'vsye' from 'vsyo'). How about Welsh? I don't seem to recall any marks on any letters there, though I don't claim familiarity with the language. There's also Serbo-Croation, which has a set of (5) marks, but you very rarely see them outside of childrens' books and language texts. For that matter, would you consider Yiddish and Hebrew? True, there are marks, but they are rarely used. Even the mark that distinguishes 'shin' from 'sin' is rarely used. (Puns on the fact are welcome!) Or were you perhaps talking only about Roman-derived alphabets? -- John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
sjaak@vuecho.psy.vu.nl (Sjaak Schuurman) (09/16/87)
In article <141@minya.UUCP> jc@minya.UUCP (John Chambers) writes: > >There's also Serbo-Croation, which has a set of (5) marks, but you very >rarely see them outside of childrens' books and language texts. In this case, it is time to give some more details about the Serbo-Croation language. It is, as far as I know, one of the few (if not only) european languages which can be written with two alphabets, which are completely equivalent. (i.e. it doesn't make difference which of the two -latin or cyrillic- you use, rather than that one is a sort of transcription of the other) The latin alphabet is more used by the people in Croatia, whereas the cyrillic version is more favourite in Serbia. One of the great characteristics of this alphabet (which consists of 30 letters) is the fact that it is completely phonetic, in the sense that every letter stands exactly for one sound, and that every letter is always pronounced the same way. The latin version uses some diacritics like a 'v' on top of a c, s or z, and there exists a 'dj' which is written as a 'd' with a stroke through the vertical part. The cyrillic version however does not use any accents, diacritics or other signs whatsoever, and Serbo-Croation can therefore be seen as one of the languages with an 'accentless' alphabet. If John Chamber refers with the set of 5 marks to the accent-like signs which (in coursebooks etc.) show how a vowel should be stressed, I would like to say that as far as I know, there are just 4 of such marks, but as he said, they are never really used in the language itself, and do not form a part of the alphabet. v v v v v "PISI KAO STO GOVORIS, CITAJ KAO STO JE NAPISANO" v / VUK KARADZIC Sjaak Schuurman
minow@decvax.UUCP (Martin Minow) (09/16/87)
Swedish, Danish, Norwegien, and Finnish do not have accented letters. All, however, have more vowels than the English alphabet can represent, so diacritics are used. Swedish occasionally uses accents to mark stress on personal names, such as "Sund'en" Korean uses a syllabic alphabet that lacks accented letters. Martin Minow decvax!minow
jeg@hector.UUCP (Judy Grass) (09/16/87)
In article <141@minya.UUCP> jc@minya.UUCP writes: >There's also Serbo-Croation, which has a set of (5) marks, but you very >rarely see them outside of childrens' books and language texts. > You can't talk about a Serbo-Croatian writing system. Serbian is written in its own form of Cyrillic (it shares a lot of letters with Russian Cyrillic, but does not use all of them, and adds a few of its own). Croatian is written in a latin alphabet, with diacritics .. some that are absolutely required, and only a very few optionals. To whit: Croatian uses the hachek (an upside down caret) over s, c, z for sh, ch, zh. an acute accent is used over c for a palatalized t (no such in english, sorry). A line through a d for a palatalized d. (I may have forgotten one or two more such). In addition, some spoken dialects of Serbian and Croatian have tones.. lengthened vowel with rising pitch, falling pitch, etc. (somewhat like Chinese, but you can learn to speak good Serbo-croatian without worrying much about it). This stuff is sometimes written by using diacritics over vowels for the benefit of foreign students. Sometimes students only get word stress. Serbo-croatian has yet a THIRD written version .. you may see Serbian texts with Serbian pronunciations and Serbian vocabulary written in Croatian style latin transcrition. The issue of whether Serbo-Croatian is one language or two is a hot one that has led to demonstrations in the streets of Zagreb, various and sundry riots and a lot of linguistic engineering. Re. Slavic writing systems: The following languages use versions of Cyrillic, in every case the writng systems are not identical: Russian, ByeloRussian, Ukrainian, Bulgarian, Macedonian, Serbian. None of these use diacritics to any great extent. The following languages use spellings based on the latin alphabet: Polish, Czech, Slovak, Slovenian, Croatian, Sorbian (a dying language). All of these use plenty of obligatory diacritics, but again all have ideosyncracies. Polish is the one in this group that stands out. Polish spells the sounds "sh", "ch" using digraphs: sz, cz. But, Polish uses diacritics to indicate nasalized vowels, long vowels, etc. The others are similar to Croatian. In Czech and Slovak you need to add accents for the vowels to indicate long vowels (not stress as stress is fixed for these languages). Cyrillic has also been improvised upon to invent writing systems for any of a number of languages of various national groups within the USSR. Moldavian for is one example (an attempt to re-ethnicize a group of Rumanians within the USSR). Various Eastern tribal languages have also gotten their first writing systems this way. I have also seen a Yiddish publication or two within the USSR that was written in Cyrillic. (Now THIS is touchy). Judy Grass, ATT Bell Labs, Murray Hill ulysses!jeg
eeyore@reed.UUCP (09/17/87)
In article <133@wundt.vuecho.psy.vu.nl> sjaak@psy.vu.nl (Sjaak Schuurman) writes: > >In this case, it is time to give some more details about the Serbo-Croation >language. >It is, as far as I know, one of the few (if not only) european languages which >can be written with two alphabets, which are completely equivalent. (i.e. it i'm pretty sure that romanian, which is written in the latin alphabet in romania, is written in the cyrillic alphabet in the moldavian s.s.r., which would give it a status similiar to serbo-croatian. question 1. does anybody know the cyrillic-moldavian equivalents of romanian? question 2. does anybody know if finnish is written in cyrillic in the karelian s.f.s.r.? -- --------------------------------- "tout cela m'est egal" -meursault. "it's all the same to me" -eeyore tektronix!reed!eeyore
firth@sei.cmu.edu (Robert Firth) (09/18/87)
A long time ago, someone wrote... >>I was told once (by a respected linguist, as I recall) that English and >>Russian are the ONLY two languages written with unaccented alphabets. We've beaten Russian to death, but did anyone point out that English also requires diacritical marks above, beside, and below letters? They are fast being dumped into the can~on of history due to the uncoo"perative ro^le of ASCII terminals, but even a soupc,on of an acquaintance with the real language should convince you they are not just mediaeval relics!
jc@minya.UUCP (jc) (09/20/87)
> I would be quite surprized if English and Russian were the only languages > with no accented letters, when a letter is regarded as accented only when it > is alphabetized the same as the original form. > > By the way, does Greek use accented letters? Indeed it does. Proper spelling in Greek requires rather frequent accents on vowels; it is rare to see more that 2 or 3 words go by without one. Some even have two: Initial vowels may have either of two 'aspiration' marks that are useless in modern Greek (one of which used to indicate an initial [h]), in addition to the main accent. There's also a cute historical quibble to the effect that English actually has an accented letter: 'i'. This is one of many letters that ultimately derives from Greek, where the dot is in fact an accent on the iota, which is properly dotless. The really weird thing in English is that we use the accent mark on the lower-case letter, but not on the upper-case 'I'. This is basically a result of historic ignorance, confusion, illiteracy, and so on, in the development of the late Latin alphabet into a 2-case form. The Greek iota can have accents (or not) in either case. Also, modern Turkish uses both the dotted and undotted 'i', to make a phonetic distinction like the vowels in 'beet' and 'bit'. The Turkish capital, Istanbul, properly has a dot over the 'I', for instance, but it's hard to do in ASCII. According to the above definition, since 'I' and 'i' are alphabetized the same in English, I would conclude that English has an accented letter! (;-) (Quick, someone point out the other accented letter in English.) [Hmmm...Is 't' really an accented 'l'? Now you're getting really weird!] -- John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
jc@minya.UUCP (jc) (09/20/87)
> >There's also Serbo-Croation, which has a set of (5) marks, but you very > >rarely see them outside of childrens' books and language texts. > > The cyrillic version however does not use any accents, diacritics or other > signs whatsoever, and Serbo-Croation can therefore be seen as one of the > languages with an 'accentless' alphabet. Actually, the Cyrillic has the same vowel marks, and they are as rarely used. > > If John Chamber refers with the set of 5 marks to the accent-like signs > which (in coursebooks etc.) show how a vowel should be stressed, I would > like to say that as far as I know, there are just 4 of such marks, but > as he said, they are never really used in the language itself, and do > not form a part of the alphabet. There's also a "long unstressed" symbol, a horizontal line above the vowel; it appears rarely, too, mostly to distinguish definite from indefinite endings (many of which differ only in length, and which can't always be inferred from context). > > v v v v v > "PISI KAO STO GOVORIS, CITAJ KAO STO JE NAPISANO" > v / > VUK KARADZIC This is one of the more impressive guidelines in modern writing systems. The author was a major Jugoslav author earlier in the century, and he was active in developing the modern two-alphabet writing system of Serbo-Croatian. The quote means "Write like you speak; read as it is written." In the context of the multiple small groups at odd with each other, and no way of enforcing a standard dialect on the population, it was a major political success that the idea was accepted. In other words, he was advising that people treat dialect differences with respect. Everyone was to write phonetically in their own dialects; when reading, you should give the author the respect of reading it as written. An educated Jugoslav is expected to understand various dialects to the point of understanding others' writing, though it may not be as you would have written it yourself. Now if the rest of the world could be taught such tolerance. (Maybe it's enough to hope that the Jugoslavs can keep such an idea alive in their own small corner of the world. -- John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
elwell@tut.cis.ohio-state.edu (Clayton Elwell) (09/21/87)
jc@minya.UUCP (jc) writes:
    There's also a cute historical quibble to the effect that English actually
    has an accented letter: 'i'.  This is one of many letters that ultimately
    derives from Greek, where the dot is in fact an accent on the iota, which
    is properly dotless.  The really weird thing in English is that we use the
    accent mark on the lower-case letter, but not on the upper-case 'I'.  This
    is basically a result of historic ignorance, confusion, illiteracy, and so
    on, in the development of the late Latin alphabet into a 2-case form.
    	John Chambers <{adelie,ima,maynard}!minya!{jc,root}> (617/484-6393)
ARF! That's not a historical quibble, it's a falsehood.  Let's take things
from the beginning, so that hopefully we can get back to international
standards ...
The Phonecians were the first people to develop an alphabet (i.e. a
writing system based on phonetics instead of pictographs).  Around 800
B.C., the Greeks decided that this was a good idea, and in 403 B.C.
they came up with an official alphabet.  Meanwhile, the Etruscans, who
were the first reasonably-sized civilization on the Italian peninsula,
had gotten their hands on an early version of the Greek alphabet [this
is where the term "beta test" comes from :-)], which they futzed
around with to suit their language.  In 700 B.C., they decided to
occupy Rome and invent urban renewal, thus giving the early Romans an
alphabet to play with.  They, in turn, fiddled with it and started
writing it with a pen, which changed the letter shapes some more.  The
Roman alphabet is usually thought to have reached its highest
aesthetic point in the inscription on the Trajan column.  This type of
letter, which is sometimes called "Roman Square Capital," or more
often just "Roman," is the direct source of modern upper case, and the
indirect source of modern lower case.
The biggest problem with roman capitals is that they are rather slow
to write, and so Roman scribes developed a variety of cursive forms,
some of which are completely unreadable to modern eyes, bearing as
they do a strong resemblance to Old Martian.  This caused some
problems, and by the 4th century a compact, informal script called
"Roman Rustic" or "Capitalis Rustica", a remarkably graceful alphabet.
This was still an all-capital script.
Well, the early Christians seemed to think that since this alphabet
had been used for (gasp) pagan writings (such as Virgil), it just
wouldn't do, and so they developed an official script, called "uncial"
by modern scholars.  It began as a majuscule script, but since it was
designed to be written quickly, many of the letterforms started to
resemble modern lower case.  This was where ascenders and descenders
made there first appearance, for example.  People finally agreed that
punctuation was a good idea, although they still ran all of the words
together.  Dots were placed above letters, but only to signify
corrections.  As the early Christian monasteries got richer, the
script got more ornate again, and word spacing was introduced, albeit
fitfully. Still no diacritic marks, except for corrections and abbreviations
(usually a horizontal stroke above two or three letters).
About this time (6th century or so), the Christians had decided to
convert England, which turned out to be more difficult than they had
hoped, but they kept working at it.  They brought with them an
informal script called "half-uncial", which contained even more
letterforms that eventually made it into modern lower case.  The
Irish, and to a lesser extent the Anglo-Saxons, took this script and
produced a script called "Insular Majuscule," one of the most
beautiful versions of the roman alphabet ever produced.  They also
produced a script called "Insular Miniscule" (suprise, suprise) that
was used at first for commentary and other 'unofficial' uses.  For the
next several centuries, things went downhill when it came to
legibilty, although punctuation was approaching its modern form.
At the end of the 8th century, a very important thing happened.
Charlemagne and his adviser Alcuin (a Benedictine monk) founded a
scriptorium at which developed a new script called "Carolingian
Miniscule", which, when revived during the Renaissance, was the direct
predecessor of the modern lower case alphabet.  Even so, diacritics
were still only used for abbreviation.  This is the origin of
"accents" in European languages.  For example, the circumflex in
French was originally an abbreviation for "s."
As this script slowly metamorphosed into Gothic, it became more
compact and less legible, since it had a more even texture.  The "i"
and "j" were still undotted, but "y" *was* dotted, so as to make it
stand out more.  By the fifteenth century, "i" sometimes had a dot as
well.  This custom continued into the Renaissance, at which time the
dot was dropped off of the "y" and added to the "j".  It was only with
the development of printing that it was standardized.
The use of accents and "breathing marks" is a feature of modern Greek,
not classical Greek.  It developed in parallel with accents in Europe,
not as a precursor to them.
Now that I've beaten this into the ground, can we get back to
international standards?  I'd be happy to continue via mail, but it
does seem to be off the subject for this newsgroup.
-- 
							      Clayton M. Elwell
       The Ohio State University Department of Computer and Information Science
       (614) 292-6546	 UUCP: ...!cbosgd!osu-cis!tut.cis.ohio-state.edu!elwell
		      ARPA: elwell@ohio-state.arpa (not working well right now)hmj@tut.fi (Matti J{rvinen) (09/21/87)
In article <421@tutor.tut.UUCP> eal@tutor.UUCP (Lehtim{ki Erkki) writes: >In article <7187@reed.UUCP> eeyore@reed.UUCP (joshua samuel honig guenter ii) writes: >>question 2. does anybody know if finnish is written in cyrillic in the karelian >>s.f.s.r.? > >They don't actually speak finnish in the Karelian s.f.s.r., i think they >speak some of the karelian dialects, East-Karelian or Ruski-Karelian, >i am not sure. They use same alphabet than we here in Finland. The language is Finnish, same words and grammar. They speak Karelian dialect, but the written text is the same than in Finland. I have two copies of newspaper "Neuvosto-Karjala" (Soviet Karelia) printed in Petroskoi and the language is the same, but they do have some odd soviet phrases :-) Finnish is never written in cyrillic except some prayers' books written around 1600 for Orthodox priests to make them Lutherian. -- Hannu-Matti Jarvinen, Tampere University of Technology, Finland Project EAST - European Advanced Software Technology hmj@tut.fi, hmj@tut.uucp, hmj@tut.funet (tut.ARPA is not the same computer).
jsa@tut.fi (Jari Salo) (09/21/87)
in article <7187@reed.UUCP>, eeyore@reed.UUCP (joshua samuel honig guenter ii) says: > Xref: tut comp.std.internat:168 sci.lang:1049 > question 2. does anybody know if finnish is written in cyrillic in the karelian > s.f.s.r.? The finnish uses the normal latin alphabet with minor modifications. A_with_two_dots, O_with_two_dots and swedish O. To my knowledge finnish has NEVER been written in cyrillic. When under russian power official documents were written using cyrillic alphabet, but the language used was not finnish. Many people living in karelia s.f.s.r. speak both russian and finnish, but they HAVE NOT (and propably never will) mixed those two languages. Anyway, writing finnish using cyrillic alphabet would be just about as sane as to write english using katagana alphabet; you'd propably have to modify the original words a bit to make them fit to the new alphabet. -- Jari Salo Tampere University of Technology UUCP: jsa@tut.UUCP Computer Systems Laboratory Internet: jsa@tut.fi PO box 527 Tel: 358-(9)31-162590 SF-33101 Tampere, Finland
zwicky@tut.cis.ohio-state.edu (Elizabeth Zwicky) (09/21/87)
In article <183@tut.cis.ohio-state.edu> elwell@tut.cis.ohio-state.edu (Clayton Elwell) writes: > >The use of accents and "breathing marks" is a feature of modern Greek, >not classical Greek. It developed in parallel with accents in Europe, >not as a precursor to them. >-- > Clayton M. Elwell Well, that depends on what you call "modern". Call me a quibbler, but accents and breathing marks are a feature of New Testament Greek, and have been receding in usefulness since. Even you usually only extend "modern" back a few hundred years... This would suggest that it was indeed a precursor. Elizabeth
franka@mmintl.UUCP (Frank Adams) (09/22/87)
In article <2541@aw.sei.cmu.edu> firth@bd.sei.cmu.edu.UUCP (PUT YOUR NAME HERE) writes: >We've beaten Russian to death, but did anyone point out that English also >requires diacritical marks above, beside, and below letters? They are >fast being dumped into the can~on of history due to the uncoo"perative >ro^le of ASCII terminals, but even a soupc,on of an acquaintance with >the real language should convince you they are not just mediaeval relics! It is my impression that these marks were very rarely used even before the advent of ASCII terminals. Further, of the four examples here, I would only classify one (uncooperative) as genuinely English; the others are unanglicized foreign words, complete with foreign marks. -- Frank Adams ihnp4!philabs!pwa-b!mmintl!franka Ashton-Tate 52 Oakland Ave North E. Hartford, CT 06108