sommar@enea.se (Erland Sommarskog) (11/19/89)
(This is hardly news for comp.std.internat readers, but the subject belongs to that group.) Salmela Jarmo (js@kaarne.tut.fi) writes: >PS. The ASCII standard that supports national characters is really >needed. Well, ASCII supports all national characters it can think of. I.e, American. But, seriously it exists. The standard you want is ISO 8859, which is a family of eight-bit standards, all with good all ASCII in the 0-127 slots, new control characters in 128-159, non-break space in 160 and "soft hyphen" in ord('-') + 128. Then the rest is different in the various standards, which are five standards with Latin characters, and one each with Kyrillic, Arabian, Hebrew and Greek characters. I don't if all of them are settled, but at least Latin-1 and Latin-2 are. One can predict that for the next few years Latin-1 will be the most important since it covers all major Western European languages except Welsh and Catalan I think. Latin-2 covers Eastern European languages. Then of course there is problem to start posting Usenet articles from your VT320 using Latin-1. People with seven-bit terminals, of which there probably are a few, will get the new characters folded into old making your text quite incomprehensible, even worse than those brackets and braces you get using the national seven-bit conventions for dotted "a":s and "o":s. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/20/89)
sommar@enea.se (Erland Sommarskog) writes: ... deleted description of the eight bit character set standard, ISO 8859 (especially ISO 8859/1 or Latin-1). >Then of course there is problem to start posting Usenet articles >from your VT320 using Latin-1. People with seven-bit terminals, >of which there probably are a few, will get the new characters >folded into old making your text quite incomprehensible, even >worse than those brackets and braces you get using the national >seven-bit conventions for dotted "a":s and "o":s. People with seven bit terminals can put filters on their news readers so they get something meaningful out of the eight bit charaters. They could for example translate the upper case icelandic thorn into 'Th' and 'o accute' into 'o'. Then I would be able to use my middle name SPELLED CORRECTLY in my signature. I could also send you direct mail in Danish and you could answer me in Swedish. We have been using the ISO set here in Iceland for some years now and I'm very surprised of how far behind the Scandinavian contries are in this sense, they all seem to be using (their own special version of) seven bit modified ASCII sets. -- Heimir Thor Sverrisson heimir@rhi.hi.is
minow@mountn.dec.com (Martin Minow) (11/20/89)
In article <472@enea.se> sommar@enea.se (Erland Sommarskog) writes: > >Salmela Jarmo (js@kaarne.tut.fi) writes: >>PS. The ASCII standard that supports national characters is really >>needed. > >Well, ASCII supports all national characters it can think of. >I.e, American. ASCII is, strictly speaking, the "national character set" for the United States. It's one of a family of "national character sets" standardized under ISO-646. National standardization authorities are empowered to define 12 (if I remember correctly) of the character positions to suit their country's needs. For example, the United Kingdom replaces the "number sign" by "Pound Sterling", while the Scandinavian countries define the character positions past 'Z' and 'z' to support their national letters. VT200/VT300 compatible terminals generally support about a dozen different national replacement sets. The standardization bodies realized in the early 1980's that ISO-646 was not a satisfactory solution, and built on the Dec Multinational character set to form ISO Latin-1, along with a structure that will define a family of 96 character supplemental sets. (ISO/ECMA has standardized about a dozen sets for Slavic, Lappish, Greek, and Hebrew, among others.) There are long-term plans to develop a 32-bit "universal" character set that can be used to communicate among all written languges. Much of that space will be used for the Asian ideographic languages (China, Taiwan, Korea, and Japan). ISO 10646 is the working title of that standard. (No, you won't have to buy more memory: there will be control sequences to let you select a slice of the character set space.) Hope this clarifies matters. This note does not represent the position of Digital Equipment Corporation. Martin Minow minow@thundr.enet.dec.com
torkil@psivax.UUCP (Torkil Hammer) (11/21/89)
In article <472@enea.se> sommar@enea.se (Erland Sommarskog) writes: #(This is hardly news for comp.std.internat readers, but the #subject belongs to that group.) # #Salmela Jarmo (js@kaarne.tut.fi) writes: #>PS. The ASCII standard that supports national characters is really #>needed. # #Well, ASCII supports all national characters it can think of. #I.e, American. # #But, seriously it exists. The standard you want is ISO 8859, #which is a family of eight-bit standards, all with good all #ASCII in the 0-127 slots, new control characters in 128-159, #non-break space in 160 and "soft hyphen" in ord('-') + 128. #Then the rest is different in the various standards, which #are five standards with Latin characters, and one each with #Kyrillic, Arabian, Hebrew and Greek characters. I don't if #all of them are settled, but at least Latin-1 and Latin-2 are. What I read is that the ESO got botched. Some national letters were overlooked, including the slashed o used in Danish and Norwegian for the umlaut o, written o: in other European languages. It does not help that the upper case variety of that letter is rather close to the slashed zero used in USA to tell it from the letter O. Danes are not likely to tolerate the o: as a substitute, and I doubt Norwegians are. WW2 and 1905 and such. Can anybody confirm? Torkil Hammer
rlee@weaver.ads.com (Richard Lee) (11/21/89)
Erland Sommarskog writes:
The standard you want is ISO 8859, which is a family of eight-bit
standards, all with good all [sic] ASCII in the 0-127 slots, new
control characters in 128-159, non-break space in 160 and "soft
hyphen" in ord('-') + 128.
As far as control characters are concerned, 8859 just states that 00/00
through 01/15 (0-31) and 07/15 through 09/15 (127-159) are non-graphic
characters not defined by the standard. To quote from ISO 8859-1: 1987
(E): "Their use is outside the scope of ISO 8859; it is specified in
other International Standards, for example ISO 646 or ISO 6429." Or has
that changed since 1987?
--
RICHARD LEE rlee@ads.com or ...!{sri-spam | ames}!zodiac!rlee
415-960-7300 ADS, 1500 Plymouth St., Mtn. View CA 94043-1230
psv@nada.kth.se (Peter Svanberg) (11/21/89)
In article <1353@krafla.rhi.hi.is> heimir@rhi.hi.is (Heimir Thor Sverrisson) writes: > >People with seven bit terminals can put filters on their news readers >so they get something meaningful out of the eight bit charaters. They >could for example translate the upper case icelandic thorn into 'Th' >and 'o accute' into 'o'. Then I would be able to use my middle name >SPELLED CORRECTLY in my signature. I could also send you direct mail >in Danish and you could answer me in Swedish. > As usual, when you change fundamental things like this, you must make it as invisible as possible for everybody who hasn't got the equipment for or isn't interested in the improvements you can get as a consequence of the change. So, those who want the improvements is the ones who must make an effort to GET them, not everybody else to AVOID them (at least not when "everybody else" is in great majority). >We have been using the ISO set here in Iceland for some years now and >I'm very surprised of how far behind the Scandinavian contries are in >this sense, they all seem to be using (their own special version of) >seven bit modified ASCII sets. There are a number of problems with converting to use an eight bit character set. A large one is that most of the software and hardware we use doesn't know anything about it. (Yes, this is slowly changing now, but it isn't good yet, and certainly was not several years ago!) What did you use before? Have you really converted to ISO 8859-1 everywhere in Iceland? On which operating systems? Other differences between us and you is that you have more non-ASCII characters than we have and that you - being a small isolated country - are very caring of your language etc. (For us it's rather the opposite on the latter point.) But, as I said, things are changing. I predict some character set confusion (of another kind than the current) in Europe in the next few years, followed by - comparatively - calm, in perhaps five years. --- psv@nada.kth.se (should work!) Peter Svanberg uunet!nada.kth.se!psv (for lazy nodes...) Dept of Num An & CS psv%nada.kth.se@uunet.uu.net (ARPA nodes) Royal Institute of Tech Stockholm, SWEDEN
ok@mudla.cs.mu.OZ.AU (Richard O'Keefe) (11/21/89)
In article <2942@psivax.UUCP>, torkil@psivax.UUCP (Torkil Hammer) writes: > What I read is that the ESO got botched. Some national letters > were overlooked, including the slashed o used in Danish and Norwegian > for the umlaut o, written o: in other European languages. In ISO 8859/1, D8 = 216 = upper-case-O-with-a-slash-through-it 16 10 F8 = 248 = lower-case-o-with-a-slash-through it 16 10 > Danes are not likely to tolerate the o: as a substitute, and I doubt > Norwegians are. If they use ISO 8859/1, they don't have to use o: as a substitute. (Now if only 8859/1 had included 66--99 and 6--9 quotation marks...)
minow@mountn.dec.com (Martin Minow) (11/22/89)
In article <2942@psivax.UUCP> torkil@psivax.UUCP (Torkil Hammer) states
that slashed-O was omitted from ISO 8859-1. Actually, upper- and
lower-case variants are in Latin-1 at hex D8 and F8 respectively
(assuming Latin-1 is in the "right half" of the code space).
However, late in the development of Latin-1, the OE and oe ligature
characters were removed, and were replaced by the "multiply" and
"division" signs. (I will not defend this decision.)
Another missing character is the Dutch ij ligature, which is imperfectly
represented by y-dieresis. Otherwise, it seems to me that most Western
European languages are well-supported by Latin-1. Exceptions include
Turkish and the Slavic languages written in Roman letters.
Martin Minow
minow@thundr.enet.dec.com
The above does not represent the position of Digital Equipment Corporation
minow@mountn.dec.com (Martin Minow) (11/22/89)
I received a mail request for the differences between Latin-1 and Dec Multinational (as implemented on the VT200 and VT300 series terminals). The following might be of interest to others. ISO Latin-1 is almost identical to Multinational. The "blank spots" in Multinational were filled in, and one or two character were changed, possibly so Dec wouldn't have a competitive advantage. We released our first products with multinational in around 1983-84, during the standardization process for Latin-1. Both tables are in the VT300 documentation. Here is how to convert Multinational to Latin-1, (assuming that Latin-1/Multinational is in the right-half of the 8-bit code space): A4 add currency symbol A6 add broken vertical bar A8 remove currency symbol, add dieresis AC add logical not symbol AD add small dash (soft hyphen) AE add "registered" symbol (R inside a circle) AF add macron (raised horizontal line) B4 add acute accent B8 add cedilla (comma, centered in the display area) BE add 3/4 D0 add Icelandic capital D- D7 remove OE, add multiplication sign DD remove Y-dieresis, add Y with acute accent DE add Icelandic capital Thorn (looks like Greek theta) F0 add Icelandic lower-case d- F7 remove oe, add division sign FD remove y-dieresis, add y with acute accent FE add Icelandic lower-case thorn FF add y-dieresis (exists in lower-case only) This is, of course, not an official list; and I apologize for any errors. Martin Minow minow@thundr.enet.dec.com The above does not represent the position of Digital Equipment Corporation
dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/22/89)
In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes: >Another missing character is the Dutch ij ligature, which is imperfectly >represented by y-dieresis. The 'ij' is *not* a special character in the Dutch language. It is only a very common sequence of two characters in our language. We have the normal (:-) 26 letter alphabet. You can find words containing 'ij' in our dictionaries in between ..ii.. and ..ik.., so not after the 'z' (like the special Scandinavian character in their dictionaries) or near the 'y'. Only our telephone company (called PTT) does not know their own alphabet and mix the 'ij' with 'y', which is of course *very* confusing :-(, e.g. Meijer ... 123456 Meyer .... 234567 Meijers .. 345678 Meyers ... 456789 I am aware of the fact that due to the common use of 'ij' some typewriters and keyboards have a special key for this string of characters. I think that the Dutch version of WordPerfect has a special character for 'ij' and this really shows nice on the output because it is (almost) as wide as one 'm' or 'w'. -- Dolf Grunbauer Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl Philips Telecommunication and Data Systems UUCP ....!mcvax!philapd!dolf Dept. SSP, P.O. Box 245, 7300 AE Apeldoorn, The Netherlands --> Holland is only 1/6 of the Netherlands <--
magnus@rhi.hi.is (Magnus Gislason) (11/22/89)
minow@mountn.dec.com (Martin Minow) writes: >ISO Latin-1 is almost identical to Multinational. The "blank spots" >in Multinational were filled in, and one or two character were changed, >possibly so Dec wouldn't have a competitive advantage. We released I think the reason why ISO did not just adopt DEC Multinational as Latin-1 is because Multinational does not include all Icelandic national characters (Iceland is a part of Western Europe). There wasn't room for all of them in the "blank spots" in the upper quarter of Multinational (C0-FF). When DEC came up with Multinational we couldn't use it here in Iceland, so DEC made an Icelandic version of Multinational, and of course they couldn't put the missing Icelandic characters in the same places as in ISO Latin-1. > D0 add Icelandic capital D- > D7 remove OE, add multiplication sign > DD remove Y-dieresis, add Y with acute accent > DE add Icelandic capital Thorn (looks like Greek theta) ^^^^^ This should be "sounds", they look different. > F0 add Icelandic lower-case d- > F7 remove oe, add division sign > FD remove y-dieresis, add y with acute accent > FE add Icelandic lower-case thorn > FF add y-dieresis (exists in lower-case only) As you can see from this list the changes are mainly concerning the Icelandic characters D-, Thorn and Y with acute accent (the Y-acute is not used in any other Western European language, as far as I know). Magnus
finn@mojo.UUCP (Finn Markmanrud) (11/22/89)
Please be kind to us poor beginners! I have no ideas on how to convert ^ to Th or anything similar. Being the only Norwegian in the company (I think), I am pretty sure I cannot get a request through to include this on our system. Some day I might be able to make my own conversion in my own directory, but until then, I would appreciate being able to read mail & news from my Scandinavian friends. Most of them use oe, ae, and aa as substitutes, and it works very well. We use 7-bits, and from what I hear, this is no longer any good. Am I about to loose touch with my old country / continent? Maybe it's not as bad as it sounds, but I thought I'd remind all you whiz's out there that there are a few people who call themselves "users," and do just that - use the facilities provided. Please be gentle! -- +=====================+========================+=============================+ | Finn Markmanrud | finn@mojo.nec.com | "It can't happen here." | | (508) 264 8668 | Boxboro, MA | F.Z. | +=====================+========================+=============================+
donn@hpfcdc.HP.COM (Donn Terry) (11/23/89)
There are actually a bunch of candidate character sets. ISO646: 7-bit, kinda like ASCII, one country at a time. Each country that uses it has it's own national variant in the "changable" characters. ISO8859: 8-bit, using 2 96 (or 95, depending on what you do with DEL) planes. Suitable for English plus choose 1 of Western Europe Eastern (Latin) Europe Cyrllic Arabic (Others; all "small" phonetic alphabets) I don't remember if Eastern Europe includes Turkish or whether it's another case. ISO2022: Lays on top of 646 or 8859 (or others) and defines language shifts. Blows away any presumption that length of string in characters == length in bytes == space used in displaying text. Various Asian national standards for the "Han" ("Chinese") character set plus national character sets for Japan and Korea. No unification of these sets. ISO10646: 32-bit everything code. Treats the various Han character sets as distinct character sets for each national usage, but unifies the Latin characters into a single set. Variable length coding possible to reduce space. Can degenerate to (something close to) 8859. UNICODE: this isn't a standard but is proposed. Unifies the Han character sets in the same way as the Latin ones (but with obviously a much bigger payback because of the size). Fixed length 16 bits. This fixes the length in characters vs. length in bytes issue. (The issue of length in display space is inherently harder because characters do vary in width in natural usage in many phonetic alphabets, as well as in the ideographic ones. See Arabic and Hindi where the constant-width usage is considered "pretty awful", albeit readable. (Even in English, good typesetting is not constant width.)) CCITT T2xx (I don't have the exact number). Another player that I just recently found out about and don't know anything about in detail. This is "teletext", I'm told. There are certainly more. Donn Terry HP Ft. Collins
heimir@rhi.hi.is (Heimir Thor Sverrisson) (11/23/89)
psv@nada.kth.se (Peter Svanberg) writes: >>People with seven bit terminals can put filters on their news readers >>so they get something meaningful out of the eight bit charaters. >As usual, when you change fundamental things like this, you must make >it as invisible as possible for everybody who hasn't got the equipment >for or isn't interested in the improvements you can get as a >consequence of the change. So, those who want the improvements is the >ones who must make an effort to GET them, not everybody else to AVOID >them (at least not when "everybody else" is in great majority). Because of the structure of ISO 8859, the eight-bit characters will fold into 'printable' seven-bit characters anyhow. If someone does not change his old system to interpret the eight-bit characters, so what? He's not interested anyway! >>We have been using the ISO set here in Iceland for some years now and >>I'm very surprised of how far behind the Scandinavian contries are in >>this sense, they all seem to be using (their own special version of) >>seven bit modified ASCII sets. >There are a number of problems with converting to use an eight bit >character set. A large one is that most of the software and hardware >we use doesn't know anything about it. (Yes, this is slowly changing >now, but it isn't good yet, and certainly was not several years ago!) You will be surprised if you really try to use eight bit data :-) Most systems are at least 'eight-bit transparent', i.e. they don't 'scrub' the data to seven-bit. Unix systems that I've used that do better than that are for example HP-UX, IBM's AIX (both RT and PS/2) all Unix's for Intel 80386 I've tested. The worst experience I've had recently was with a Sun 4 csh that logs you out if you enter a character with the eighth bit set! Many software packages now allow eight-bit data. I was just testing Informix RDBS on this same Sun 4 and found out that I could really enter eight bit data into forms, what I could not do two years ago. We've also got some public domain software that has been *corrected* to be able to use eight-bit characters such as mailers, editors and news readers. >What did you use before? Have you really converted to ISO 8859-1 >everywhere in Iceland? On which operating systems? We did have a national version of ISO-646 that could not cover all the accented characters we've got. The Unix systems are generally using ISO, which is the only official Iclandic standard for eight-bit character sets. On PC's people are using a national version of the American PC-set (yuk) and very few have adopted Code Page 850 that came from IBM when they introduced the PS/2 line. On the IBM-360/370 and 3X and AS400 they are using some (different) versions of EBCDIC :-( >Other differences between us and you is that you have more non-ASCII >characters than we have and that you - being a small isolated country >- are very caring of your language etc. (For us it's rather the >opposite on the latter point.) The first point is certainly true, our alphabet has 36 characters, which means that we need 20 characters (uc+lc) that are not in ASCII. I would certainly not tolerate a letter from the authorities that would not have my name spelled correctly ! >But, as I said, things are changing. I predict some character set >confusion (of another kind than the current) in Europe in the next few >years, followed by - comparatively - calm, in perhaps five years. I don't think it will even take so long. All major hardware manufacturers have made most of their terminal equipment independent of the character set by moving functions into software that were previously done in hardware. The european market is also the fastest growing for many soft- ware houses and is in many cases already bigger than the US market. If these people really want to make it over here they can solve many of their problems by using ONE character set that covers the US, Europe and South America! -- Heimir Thor Sverrisson heimir@rhi.hi.is
psv@nada.kth.se (Peter Svanberg) (11/23/89)
In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes: > : > : >However, late in the development of Latin-1, the OE and oe ligature >characters were removed, and were replaced by the "multiply" and >"division" signs. (I will not defend this decision.) > Are you stating that the document I have - "International Standard ISO 8859-1, First edition 1987-02-15" - isn't valid any more? Are there other changes than the characters you name? (It seems strange to change a published standard so seriously.) --- psv@nada.kth.se (should work!) Peter Svanberg uunet!nada.kth.se!psv (for lazy nodes...) Dept of Num An & CS psv%nada.kth.se@uunet.uu.net (ARPA nodes) Royal Institute of Tech Stockholm, SWEDEN
pedersen@philmtl.philips.ca (Paul Pedersen) (11/23/89)
In article <2942@psivax.UUCP> torkil@psivax.UUCP (Torkil Hammer) writes: >What I read is that the ESO got botched. Some national letters >were overlooked, including the slashed o used in Danish and Norwegian >for the umlaut o, written o: in other European languages. >It does not help that the upper case variety of that letter is rather >close to the slashed zero used in USA to tell it from the letter O. >Danes are not likely to tolerate the o: as a substitute, and I doubt >Norwegians are. WW2 and 1905 and such. > >Can anybody confirm? > >Torkil Hammer I've got ISO 8859-1:1987(E) "Latin alphabet No.1" in front of me and see both character you say are missing : pos char F6 small o umlaut D6 big o umlaut F8 small o slashed D8 big o slashed Did I misunderstand your question ? Paul
tml@hemuli.atk.vtt.fi (Tor Lillqvist) (11/23/89)
In article <540@ssp11.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes: >In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes: >>Another missing character is the Dutch ij ligature, > >The 'ij' is *not* a special character in the Dutch language. It is only a >very common sequence of two characters in our language. We have the Well, as Martin Minow said, it is a _ligature_, which means that it is perfectly OK to print it as "i" followed by "j", but in quality typesetting you should use a specially designed character for the combination. I don't think it is necessary to include the ij ligature in Latin-1 or similar character sets. They don't contain the fi or ffi ligatures, either (not to mention kerning information). The Chicago Manual of Style says that ij should be capitalized as IJ (for example: IJsland). How well is this adhered to by the Dutch? -- Tor Lillqvist, VTT/ATK
kkim@sparky.UUCP (kyongsok kim) (11/24/89)
In article <9300002@hpfcdc.HP.COM> donn@hpfcdc.HP.COM (Donn Terry) writes:
:ISO2022: Lays on top of 646 or 8859 (or others) and defines language shifts.
: Blows away any presumption that length of string in characters ==
: length in bytes == space used in displaying text.
Could you please elaborate "on top of" and "language shifts" (possibly
using examples)?
Thanks in advance.
Kyongsok Kim
Dept. of Comp. Sci., North Dakota State University
e-mail: nukim@plains.nodak.edu; nukim@ndsuvax.bitnet; uunet!ndsuvax!nukim
dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) (11/24/89)
In article <4318@hemuli.atk.vtt.fi> tml@hemuli.atk.vtt.fi (Tor Lillqvist) writes: -In article <540@ssp11.idca.tds.philips.nl> dolf@idca.tds.PHILIPS.nl (Dolf Grunbauer) writes: ->In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes: ->>Another missing character is the Dutch ij ligature, ->The 'ij' is *not* a special character in the Dutch language. It is only a ->very common sequence of two characters in our language. We have the -Well, as Martin Minow said, it is a _ligature_, which means that it is -perfectly OK to print it as "i" followed by "j", but in quality -typesetting you should use a specially designed character for the -combination. I don't think it is necessary to include the ij ligature -in Latin-1 or similar character sets. They don't contain the fi or -ffi ligatures, either (not to mention kerning information). I think I misinterpret the meaning of ligature :-). Quality printing will most of the time use proportional character width so it will automatically position the "i" and "j" very close to each other. Maybe even "fi" & "ffi" will be printed quite nice this way. -The Chicago Manual of Style says that ij should be capitalized as IJ -(for example: IJsland). How well is this adhered to by the Dutch? Completly, this is the way we do it. The funny thing is that the Germans do it wrong when they talk about our city IJmuiden or lake IJsselmeer, as they write: Ijmuiden and Ijsselmeer. -- Dolf Grunbauer Tel: +31 55 433233 Internet dolf@idca.tds.philips.nl Philips Telecommunication and Data Systems UUCP ....!mcvax!philapd!dolf Dept. SSP, P.O. Box 245, 7300 AE Apeldoorn, The Netherlands --> Holland is only 1/6 of the Netherlands <--
rlee@weaver.ads.com (Richard Lee) (11/25/89)
In article <2382@draken.nada.kth.se> psv@nada.kth.se (Peter Svanberg) writes: In article <1083@mountn.dec.com> minow@mountn.dec.com (Martin Minow) writes: >However, late in the development of Latin-1, the OE and oe ligature >characters were removed, and were replaced by the "multiply" and >"division" signs. (I will not defend this decision.) Are you stating that the document I have - "International Standard ISO 8859-1, First edition 1987-02-15" - isn't valid any more? Are there other changes than the characters you name? (It seems strange to change a published standard so seriously.) Now _I'm_ confused! My copy of that _same_ document (ISO 8859-1 First Edition 1987-02-15; Reference number ISO 8859-1: 1987 (E)) _does_ have the multiplication and division signs exactly as Martin described. Quoting from Table 1, page 4: "13/07 MULTIPLICATION SIGN" and "15/07 DIVISION SIGN". -- RICHARD LEE rlee@ads.com or ...!{sri-spam | ames}!zodiac!rlee 415-960-7300 ADS, 1500 Plymouth St., Mtn. View CA 94043-1230
magnus@rhi.hi.is (Magnus Gislason) (11/25/89)
heimir@rhi.hi.is (Heimir Thor Sverrisson) writes: [Talking about the Icelandic alphabet] >The first point is certainly true, our alphabet has 36 characters, which >means that we need 20 characters (uc+lc) that are not in ASCII. I would You should know that the Icelandic alphabet does not include C, Q, W and Z, and thus only contains 32 characters. :-)
einari@rhi.hi.is (Einar Indridason) (11/26/89)
In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes: >heimir@rhi.hi.is (Heimir Thor Sverrisson) writes: > >[Talking about the Icelandic alphabet] > >>The first point is certainly true, our alphabet has 36 characters, which >>means that we need 20 characters (uc+lc) that are not in ASCII. I would > >You should know that the Icelandic alphabet does not include C, Q, W and Z, >and thus only contains 32 characters. :-) I will most definitely not write 'pizza' as 'pissa' :-) (Besides 'pissa' has another meaning in icelandic as well) But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which does not allow us here in Iceland to use our full national character set. For example, DBase-III does not allow the big 'thorn', but instead considers that as a end-of-file. Meaning that whatever comes after the big thorn is ignored. Some editors choke or perform some unwanted commands, whenever the special icelandic characters are used, like 'kill-file', 'save-and-quit' and other nasties like that. If there are any software-writers out there, please consider us Icelanders (and other), that must use 8-bit character set. While you are doing that, could you consider adding some 'sorting tables' so that we can sort our applications in the icelandic way. ???????????????????? -- To quote Alfred E. Neuman: "What! Me worry????" Internet: einari@rhi.hi.is UUCP: ..!mcvax!hafro!rhi!einari
sommar@enea.se (Erland Sommarskog) (11/26/89)
Martin Minow (minow@mountn.UUCP) writes: >The standardization bodies realized in the early 1980's that ISO-646 >was not a satisfactory solution, and built on the Dec Multinational >character set to form ISO Latin-1, I'm a little surprised by this. My impression is that it was the other way round. Digital took the drafts of Latin-1 and made DEC Multinational. But I have no sources that confirms that. -- Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/26/89)
In article <1386@krafla.rhi.hi.is> einari@rhi.hi.is (Einar Indridason) writes: >In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes: >>heimir@rhi.hi.is (Heimir Thor Sverrisson) writes: >> >>[Talking about the Icelandic alphabet] >> >>>The first point is certainly true, our alphabet has 36 characters, which >>>means that we need 20 characters (uc+lc) that are not in ASCII. I would >> >>You should know that the Icelandic alphabet does not include C, Q, W and Z, >>and thus only contains 32 characters. :-) We can argue about this, but the main point is of course, that for every practical purposes, Icelanders need to deal with those 36 characters. For instance, every character you mention, appears in the phone directory -- names of Icelandic people. (although the roots of their names are typically foreign, or poor foreign imitation :-) >But I'm really pissed off (no 'pizza' here :-) about 'americaned' software which >does not allow us here in Iceland to use our full national character set. ...[examples deleted] >If there are any software-writers out there, please consider us Icelanders >(and other), that must use 8-bit character set. Reminds me of this fantastic software called X11. They have several nice fonts, including the full ISO-8859-1 standards. But typically applications strip the most significant bit in the data, so they can only display the English set :-( Of course there is always a way to go around it, and I know Icelanders have managed to hack their way through, in several cases. But that simply illustrates how stupid the design was, not to make this an option in the first place. Kjartan.
minow@mountn.dec.com (Martin Minow) (11/27/89)
In article <500@enea.se> sommar@enea.se (Erland Sommarskog) notes
that, in contrast to what I had written, Dec actually took an
early draft of Latin-1 when it came time to produce the first products
using Multinational.
My apologies for a poorly thought-out posting.
Martin.
minow@thundr.enet.dec.com
matsc@sics.se (Mats Carlsson) (11/27/89)
In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes:
You should know that the Icelandic alphabet does not include C, Q, W and Z,
and thus only contains 32 characters. :-)
Really? Wasn't it quite recently that a spelling reform said words
like "yzt" should be spelled with an s instead of a z, reverting an
earlier law which banned writing s instead of z? Didn't Halldor
Laxness even spend some time in prison for this "crime"?
--
Mats Carlsson
SICS, PO Box 1263, S-164 28 KISTA, Sweden Internet: matsc@sics.se
Tel: +46 8 7521543 Ttx: 812 61 54 SICS S Fax: +46 8 7517230
psv@nada.kth.se (Peter Svanberg) (11/27/89)
I wrote: > > Are you stating that the document I have - "International Standard > ISO 8859-1, First edition 1987-02-15" - isn't valid any more? > rlee@weaver.ads.com (Richard Lee) answered: > Now _I'm_ confused! My copy of that _same_ document (ISO 8859-1 First > Edition 1987-02-15; Reference number ISO 8859-1: 1987 (E)) _does_ have > the multiplication and division signs exactly as Martin described. > Quoting from Table 1, page 4: "13/07 MULTIPLICATION SIGN" and "15/07 > DIVISION SIGN". Sorry, I intermixed it with the discussion about slashed O. So as it's a ligature, the same things apply as in the discussion about the ij ligature, I suppose. --- psv@nada.kth.se Peter Svanberg uunet!nada.kth.se!psv (for lazy nodes...) Dept of Num An & CS psv%nada.kth.se@uunet.uu.net (ARPA nodes) Royal Institute of Tech Stockholm, SWEDEN
stefan@svax.cs.cornell.edu (Kjartan Stefansson) (11/27/89)
In article <MATSC.89Nov27092541@vishnu.sics.se> matsc@sics.se (Mats Carlsson) writes: >In article <1383@krafla.rhi.hi.is> magnus@rhi.hi.is (Magnus Gislason) writes: > You should know that the Icelandic alphabet does not include C, Q, W and Z, > and thus only contains 32 characters. :-) > >Really? Wasn't it quite recently that a spelling reform said words >like "yzt" should be spelled with an s instead of a z, reverting an >earlier law which banned writing s instead of z? Yes, this is correct. 'z' used to be perfectly valid Icelandic letter. But it is pronounced as 's' in modern Icelandic. The only way to distinguish between 's' and 'z' in spelling, was to know the root of the word. Few years ago, a spelling reform was made, to replace the 'z' by a 's'. > Didn't Halldor >Laxness even spend some time in prison for this "crime"? Halldor Laxness has been known for his style of spelling, which in general is closer to the spoken language than the official spelling. In his early work he was criticized a lot for this, but I don't believe he was ever imprisoned for it! Kjartan.
eru@tnvsu1.tele.nokia.fi (Erkki Ruohtula) (11/28/89)
I have long wondered why the ISO 8-bit character set introduces 32 more control characters, while the ANSI system of terminal controls demonstrates that we could in principle get along with just one control character. Using these 32 positions for printable characters would have made possible a single set for all (or nearly all) languages that use a latin-derived alphabet. Erkki Ruohtula / Nokia Telecommunications ! eru@tele.nokia.fi / P.O. Box 33 SF-02601 Espoo, Finland ! Huomautus : Esitt{m{ni mielipiteet ovat vain omiani. ! Disclaimer: The opinions I have presented are just my own.!