[comp.std.internat] Reneging on promises

macrakis@gr.osf.org (Stavros Macrakis) (01/07/91)

In article <AMANDA.91Jan5223311@jordan.iesd.auc.dk> amanda@iesd.auc.dk (Per Abrahamsen) writes:

   A new standard (ISO Latin1) has been created, which conforms to the
   ASCII standard, and is quickly being accepted.  The best thing to
   do might be to ignore ISO 646, and switch to ISO Latin1 as fast as
   possible.

I agree that ASCII is the de facto standard, and that it is
unrealistic to expect existing 7-bit ASCII programs to be updated to
ISO 646.

However, beware!  Latin1 does NOT cover all Latin-alphabet languages,
only western European ones (and excludes a couple of minor cases like
French oe and Dutch ij).  For instance, it is missing (Polish)
barred-l, (Czech) hacek-r, (Turkish) hacek-g, (Croatian) barred-d,
(Dutch) ij, (French) oe, (Romanian) t-cedilla, and (Chinese pinyin)
o-hacek.

And of course it does not cover other alphabets (Greek, Cyrillic,
Arabic, Hebrew, etc.), much less Chinese characters.

I do not think it appropriate to make a quick patch for Western
European languages which will have to be changed soon thereafter for
other languages.

It <<does>> seem appropriate though to insist that all string-handling
primitives be able to handle <<all>> 8-bit characters.  Although it
would be nice as well to permit string literals with (say) Latin-1, I
am now not convinced it is a good idea.  Natural-language literal
strings should always be separated from the program to allow for later
localization (translation of messages etc.).

And let's hope that the transition to 16-bit characters will not be
too painful when it comes.

	-s

erik@srava.sra.co.jp (Erik M. van der Poel) (01/21/91)

In article <2435@enea.se> sommar@enea.se (Erland Sommarskog) writes:
> Also sprach Stavros Macrakis (macrakis@gr.osf.org):
> > I agree that ASCII is the de facto standard, and that it is
> > unrealistic to expect existing 7-bit ASCII programs to be updated to
> > ISO 646.
> 
> What really is the issue in the case of programming language,
> or for that matter a command interpretor, is to not use
> letter that are subject to variation according to ISO 646.
> So it is not really a question of update, it is more a question
> of not raising hinders for use of ISO 646.

Yes, but Stavros is talking about *existing* programs, so how can it
be anything other than a question of updating? (Not that I am for
updating...)

> One may claim that we'd best drop 646 and move on to Latin-1 as
> far as possible. However, this if anything site-dependent. I
> was contracted for two and a half years for a customer where
> 8-bit characters were reality (DEC Multinational, though), but
> now I am at my a major Swedish company and all their equipment 
> seem to be seven-bit.

I wonder which of the following 2 alternatives will be less expensive
to the company in the long run:

 (a) replacing or adjusting the 7-bit hardware, or
 (b) countless frustrating man-hours battling software incompatibility

> At ENEA we mainly have eight-bit terminals, but to what use?
> You cannot use the eighth bit in Emacs.

I find it hard to believe that there is no 8-bit version of Emacs
floating around in Europe. Here in Japan, we have been using 16-bit
characters in Emacs for *years*.

> Mail standards, as
> was posted in comp.std.internat earlier, are explicitly seven-
> bit.

One of the rules of Junet, a Japanese network, is that the 7-bit JIS
code must be used, together with escape sequences to allow mixing with
ASCII. So many organizations convert to JIS before sending messages to
the outside Junet world. They can use 8-bit codes in-house by
installing 8-bit clean versions of sendmail. Also, B-News had to be
updated to allow Escape to pass through.

Of course, it is difficult to get all sites to install the updated
versions of the network software, but "Where there is a will, there is
a way", as they say. And the Japanese had one hell of a will.
-
-- 
Erik M. van der Poel                                      erik@sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692

sommar@enea.se (Erland Sommarskog) (01/22/91)

Also sprach Erik M. van der Poel (erik@srava.sra.co.jp):
>Yes, but Stavros is talking about *existing* programs, so how can it
>be anything other than a question of updating? (Not that I am for
>updating...)

Hm, the starting point for our discussion was a programming language
(Eiffel) who use []{}\ as special character, and whereof it would
be very easy to provide alternatives for the first four.

>I wonder which of the following 2 alternatives will be less expensive
>to the company in the long run:
>
> (a) replacing or adjusting the 7-bit hardware, or
> (b) countless frustrating man-hours battling software incompatibility

There is no software incompatibility to battle. It is just that you 
have to chose between reading Swedish with brackets and braces or 
reading programming code where letters appears as special characters.
There probably is a cost in man-hours, but none that you just get
some figures and then walk into management and say: "hey, let's
throw out this stone-age equipment."

>I find it hard to believe that there is no 8-bit version of Emacs
>floating around in Europe. Here in Japan, we have been using 16-bit
>characters in Emacs for *years*.

There is no official from GNU, but there hacks floating around, yes.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
One likes to believe in the spirit of muzak.

erik@srava.sra.co.jp (Erik M. van der Poel) (01/22/91)

In article <2445@enea.se> sommar@enea.se (Erland Sommarskog) writes:
> Also sprach Erik M. van der Poel (erik@srava.sra.co.jp):
> > Yes, but Stavros is talking about *existing* programs, so how can it
> > be anything other than a question of updating? (Not that I am for
> > updating...)
> 
> Hm, the starting point for our discussion was a programming language
> (Eiffel) who use []{}\ as special character, and whereof it would
> be very easy to provide alternatives for the first four.

I assume that what you want is that new languages are designed such
that the ISO 646 substitutable characters are not used. Personally, I
think that new languages should go ahead and use any ASCII characters
that they wish, so that there will be obstacles for ISO 646 users,
hopefully eventually leading to the decline of ISO 646.

> > I wonder which of the following 2 alternatives will be less expensive
> > to the company in the long run:
> > 
> >  (a) replacing or adjusting the 7-bit hardware, or
> >  (b) countless frustrating man-hours battling software incompatibility
> 
> There is no software incompatibility to battle. It is just that you 
> have to chose between reading Swedish with brackets and braces or 
> reading programming code where letters appears as special characters.

I guess I should have made myself more clear. Assuming that you do not
like the current situation of having brackets in Swedish words and
Swedish characters in programs, and assuming that you want to do
something about it, which of the above alternatives would be less
costly? I.e. (a) is where you get the hardware to deal with 8-bit
codes, and (b) is where you get the software to deal with ISO 646 by
avoiding the use of ISO 646 substitutable characters in programming
constructs, etc. With (b) you create incompatibility between e.g. 
Scandinavia and USA, which I would think is a very big disadvantage.

> There probably is a cost in man-hours, but none that you just get
> some figures and then walk into management and say: "hey, let's
> throw out this stone-age equipment."

Yes, it may be difficult to convince the management.
-
-- 
Erik M. van der Poel                                      erik@sra.co.jp
Software Research Associates, Inc., Tokyo, Japan     TEL +81-3-3234-2692

sommar@enea.se (Erland Sommarskog) (01/27/91)

Also sprach Erik M. van der Poel (erik@srava.sra.co.jp):
>codes, and (b) is where you get the software to deal with ISO 646 by
>avoiding the use of ISO 646 substitutable characters in programming
>constructs, etc. With (b) you create incompatibility between e.g.
>Scandinavia and USA, which I would think is a very big disadvantage.

Only if we hack the compiler locally or write our own stuff. What
I want is of course alternative characters, so I don't have to
use the brackets and braces. Some languages like Pascal provides
this.
-- 
Erland Sommarskog - ENEA Data, Stockholm - sommar@enea.se
One likes to believe in the spirit of muzak.

keld@login.dkuug.dk (Keld J|rn Simonsen) (01/30/91)

sommar@enea.se (Erland Sommarskog) writes:

>Also sprach Erik M. van der Poel (erik@srava.sra.co.jp):
>>codes, and (b) is where you get the software to deal with ISO 646 by
>>avoiding the use of ISO 646 substitutable characters in programming
>>constructs, etc. With (b) you create incompatibility between e.g.
>>Scandinavia and USA, which I would think is a very big disadvantage.

>Only if we hack the compiler locally or write our own stuff. What
>I want is of course alternative characters, so I don't have to
>use the brackets and braces. Some languages like Pascal provides
>this.

Actually this can be done in the current ISO/ANSI C standard also,
via the trigraph sequences. And other proposals have been made
for ISO/ANSI C, because some people (including myself) do not
find the trigraphs very useful (they are unreadable, IMHO).

Keld Simonsen

henry@zoo.toronto.edu (Henry Spencer) (01/30/91)

In article <keld.665177880@dkuugin> keld@login.dkuug.dk (Keld J|rn Simonsen) writes:
>... other proposals have been made
>for ISO/ANSI C, because some people (including myself) do not
>find the trigraphs very useful (they are unreadable, IMHO).

As various people have pointed out, they are not meant to be readable.
They are intended as an interchange format, not for routine human use.
-- 
If the Space Shuttle was the answer,   | Henry Spencer at U of Toronto Zoology
what was the question?                 |  henry@zoo.toronto.edu   utzoo!henry