[news.misc] Time for 8 bit news, isn't it?????.

kibo@pawl.rpi.edu (James 'Kibo' Parry) (07/21/90)

In article <15688@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>In article <777@hades.ausonics.oz.au> greyham@hades.ausonics.oz.au (Greyham Stoney) writes:
>>Why don't all you people divert your energies into making your news system
>>handle 8 bit news rather than developing new and incompatible ways of
>>bitbashing your files into a format that both news and your unpacking program
>>(be it /bin/sh, sed, awk or whatever) can cope with?.
>
>8 bit news would help only slightly with things OTHER than the transmission
>of binary files via news.   Seven bit is basically doing the job now;
>the remaining issues (envelope consistency, line lengths, character sets,
>paragraph wrapping etc) aren't going to be solved by going to 8 bits.

Going to eight bits WOULD be nice for people using languages other than
English;  as it is now, if you're in, say, Finland, and you have a terminal
that does the Finnish variant of ASCII, people outside Finland are going
to see braces, brackets, backslashes, etc., wherever you say something
with an accented character.

Hoever, there are some good 8-bit character sets (IBM PC's, HP's,
ECMA-94 Latin, etc.) which could be used so that when someone in Germany
types an "o" with an umlaut, in France it'll appear as an "o" with an
umlaut and not as a question mark or a bracket or something.  Each foreign
character could have exactly one respresentation, as opposed to the
current scheme where some systems use ASCII, some use the Swedish variant,
etc...

Doing this would probably be best handled by (a) picking a standard (let's
say we decide that non-English characters will be located in the ECMA character
set.) and then (b) we either give everyone in the world a terminal that
can display them (which seems very unfeasible) or else we just build into
the next versions of the news-reading software a little option that maps
the plus-128 characters onto the PC, HP, ECMA, ASCII, whatever character
set as it displays articles.

I'm sure someone will be able to poke holes in this idea, but it seems like
something we should at least consider, given that the United States no
longer accounts for as much of the Usenet readership as it used to.

Comments?

-- 
james "kibo" parry, 138 birch lane, scotia, ny 12302 <-- close to schenectady.
kibo@pawl.rpi.edu            _________________________________________________
kibo%pawl.rpi.edu@rpi.edu   / Kibology    /  Anything I say is my opinion,
userfe0n@rpitsmts.bitnet   /  is better! /   and is the opposite of Xibo's.

eps@toaster.SFSU.EDU (Eric P. Scott) (07/21/90)

We have an 8-bit standard: ISO 8859.  Great for us Western
European-American types.  It doesn't help the fj newsgroups much.
(or kremvax!gorby)

					-=EPS=-

Dan@dna.lth.se (Dan Oscarsson) (07/21/90)

In article <+7Y$AV&@rpi.edu> kibo@pawl.rpi.edu (James 'Kibo' Parry) writes:
>In article <15688@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes:
>>In article <777@hades.ausonics.oz.au> greyham@hades.ausonics.oz.au (Greyham Stoney) writes:
>>>Why don't all you people divert your energies into making your news system
>>>handle 8 bit news rather than developing new and incompatible ways of
>>>bitbashing your files into a format that both news and your unpacking program
>>>(be it /bin/sh, sed, awk or whatever) can cope with?.
>>
>>8 bit news would help only slightly with things OTHER than the transmission
>>of binary files via news.   Seven bit is basically doing the job now;
>>the remaining issues (envelope consistency, line lengths, character sets,
>>paragraph wrapping etc) aren't going to be solved by going to 8 bits.
>
>Going to eight bits WOULD be nice for people using languages other than
>English;  as it is now, if you're in, say, Finland, and you have a terminal
>that does the Finnish variant of ASCII, people outside Finland are going
>to see braces, brackets, backslashes, etc., wherever you say something
>with an accented character.
>

Yes it is time to start thinking about using an international character set
in netnews. This means that 8bit bytes are used but not that binary files
can be transmitted without encapsulation. Binary files must still be
converted into a encoded format that can be check and unpacked in a controlled
manner.

Only one character set should be used for transmitting articles as it is 
impossible for everyone to handle all in the world. In european talks about
a character set to use for mail ISO 10646 is the best candidate and it
should be fine for netnews also. ISO 10646 has both ASCII and ISO 8859-1 as
true subsets and that will easy compatability problems.

Local netnews readers will have to convert from ISO 10646 into the character
set used locally.

Using ISO 10646 allows nearly every letter in the world to be written.

    Dan

-- 
Dan Oscarsson                              Department of Computer Science
                                           Lund Institute of Technology
e-mail:  Dan@dna.lth.se                    Box 118
                                           S-221 00 Lund, Sweden

guy@auspex.auspex.com (Guy Harris) (07/22/90)

>We have an 8-bit standard: ISO 8859.  Great for us Western
>European-American types.  It doesn't help the fj newsgroups much.
>(or kremvax!gorby)

ISO 8859 doesn't help the "fj" newsgroups much, but "kremvax!gorby"
could use ISO 8859/5, Latin-Cyrillic alphabet.

Did you perhaps mean "ISO 8859/1" rather than "ISO 8859"?

frisk@rhi.hi.is (Fridrik Skulason) (07/22/90)

Well - some of us have 8-bits news already - I am for example using an 8-bit
'rn' right now.  The program only required a few minor modifications to work
properly.  The reason we went to 8-bit news and E-mail is quite simple - our
alphabet contains 10 charactes not found in standard ASCII.  Of course I can
only post 8-bit articles to our local newsgroups - the rest of the world is
still only 7-bit   :-(

I fully agree that we need an 8-bit news system (as well as 8-bit E-mail),
as this would make life a lot easier for those of us not using English.

Modifying the news software to permit the transmission of 8-bit data is
trivial - the real problem is the charcter set issue.

I don't know if the readers of this group are familiar with a similar
discussion regarding automatic translation between character sets in the
Kermit program.  The conclusions reached there seem to apply to the 8-bit
News/E-mail discussion as well, though.

Some possible solutions:

(1)  Each machine posts articles using the user's character set of choice.
To indicate which character set is used, a new field is added to the header.

                 examples:     Character-set: CP 870
	                       Character-set: ISO 8859/4

This is easy to implement, but has one serious drawback - all machines are
required to be able to handle all possible character sets.

(2)  On every machine the article is translated into one of the ISO 8859/x
series of character sets.  8859/1 would probably be most used, as it covers
most of the languages of Western Europe.  8859/2, 8859/3, 8859/4 etc. would
solve the needs of those using Greek, various Eastern European languages and
(I think) Hebrew and Arabic.  This would not solve the problem of those using
a 16-bit character set.  Also, I am not sure if Esperanto is included in any
of the ISO 8859/x standards.

(3)  All text is transmitted according to the ISO 10646 standard.  This has
one advantage compared to (2) - it allows the transmission of documents
containing 16-bit characters, as well as documents containing characters from
more than one of the 8859/x standards.  For example, one could send a message
with the first part in Russian and the second part in Greek.

My opinion is that (3) is more of a long-term goal - for 95 % of users of
Usenet, (2) is all that is needed.

But what changes would (2) require ?

Change #1:  Any ASCII computer on Usenet must accept 8-bit news and E-mail,
            and be able to forward articles without changes (in other words - 
            don't strip the eight bit !!!)  This is the only change required
            from the "English-only" ASCII-sites, where no 8-bit articles
            would originate or be read.

Change #2:  Any computer on Usenet using an extended version of ASCII (CP 437,
            ISO 8859/x etc) must translate all postings to one of the 8859/x
            charcter sets and indicate (in the header) which one is used.
            This change would be required from European/Non-English using users.

Change #3:  Any computer not using ASCII, but rather EBDIC (or something else),
            must translate all postings to one of the 8859/x character sets,
            instead of just translating to ASCII.  

Change #4:  Any computer must accept postings in one of the 8859/x character
            sets and be able to translate them to the character set used
	    by each user.

Problem #1: If the local character set is not able to represent all the
            charactes in the original posting, they must be represented as
            well as possible.  For example - a 7-bit computer receiving a text
            containing accented wovels might be expected just to drop the
            accent marks.

Problem #2: Different users - even on the same machine - have different
            capabilities to display 8-bit text.  For example, in Scandinavia
            it is common for terminals to use a 7-bit character set, where
            some of the characters (for example { [ ] } |) have been replaced
            by non-ASCII characters.  Other users in the same countries have
            fully 8-bit terminals (for example PCs running an terminal
            emulator).  The computer must store incoming articles as they
            arrive and the news/E-mail software must be updated to display
            them according to the capabilities of each terminal, as indicated
            by an environment variable.

So - what now ?

Is there any interest in creating a "working group" to attack the problem ?
Any of the authors of rn, nn, elm or other news/e-mail software out there ?

We are of course willing to share our modifications to the programs, and with
a bit of work we should be able to have 8-bit news/email running in a few
months.

So - any volunteers ?


-- 
Fridrik Skulason      University of Iceland  |       
Technical Editor of the Virus Bulletin (UK)  |  Reserved for future expansion
E-Mail: frisk@rhi.hi.is    Fax: 354-1-28801  |   

ed@braaten.doit.sub.org (Ed Braaten) (07/24/90)

scs@lokkur.dexter.mi.us (Steve Simmons) writes:

>Hey, news is ASCII-based, written in english-speaking countries for
>english-speaking readers.  That fact that it works *at all* for
>international and non-English stuff is a wonderful plus.  If regional
>newgroups have regional needs, they should go ahead and fill them.

Thats funny Steve - 50% of the news I read is in German.  I'm living
in Germany right now.  Although many of the 60+ Million Germans here can
speak English (often better than we americans ;-), the language in this 
country is German.  And the german language news here is not limited to
"regional" consumption.  I'm aware of several sites there in the good
ole USA that are carrying the german groups also.  I'm willing to bet 
there is a LOT of non-English stuff floating around out there.  So why
don't we drop the provincial attitudes - lets hear it for 8-bit news!
It won't make English any harder to read, but it would certainly make 
life easier for the rest of the USENET.

>But neither side should expect interoperatbility.

Say what?  Interoperability and the free exchange of information is 
in my opinion exactly what makes USENET so successful...


---------------------------------------------------------------------------
        Ed Braaten             |  Jesus answered,  "I am the way and the
Work: ed@imuse.de.intel.com    |  truth and the life.  No one comes to the
Home: ed@braaten.doit.sub.org  |  Father except through me."   John 14:6 
---------------------------------------------------------------------------

VERKADE@CTSS.CO.UK (Herman Verkade) (08/02/90)

A couple of comments on 8 bit news. It seems to me that it is not necesary to
convert the whole net to 8 bit. The 7 bit restriction is only a problem for
specific newsgroups: newsgroups in languages other than english and newsgroups
containing binary data, such as bitmaps, .gif files, etc. So, I don't think
**everybody** needs to upgrade to some implementation that supports 8 bits.
Only those that wish to carry newsgroups, that need it. All we would need
is a standard, not necesarily a world-wide upgrade of software.

For example, if the Germans and the Fins decide that their local language
newsgroups will be 8 bit, then that is of no business of the Americans,
British, Spanish or Russians. As long as there is some software that support
8 bits (And C News seems to do that or alternatively a few changes in B News)
and maybe some way of indicating that a particular group expects 8-bit, so
that when posting to a 7 bit group another signature can be used; one that
doesn't have 8 bit characters in it. And if people want to start a new
newsgroup for .gif files in 8 bit mode then, again, the sites that want
to carry it must install an 8 bit version. If your feed doesn't support it,
get another feed for that newsgroup (A similar situation exists in the UK,
where UKC doesn't carry nor forward `alt.sex.pictures'. Sites that want it
get another feed for that group).

The next problem would be how to read an 8 bit group, both for groups that
use different character sets and for groups that carry other 8 bit data. As
someone earlier suggested, maybe an extra header should be added to the
standard that indicates what type of data it is. For mail there are RFC-1049
(Content-Type) and RFC-1154 (Encoding), which are extensions to RFC-822. The
extra header fields would only need to be interpreted by the news reader. So,
only if you want to read an 8 bit group, get an 8 bit reader. As long as we
can agree on some standard.

My proposal would be RFC-1154-style, because it also allows one message to
contain encodings in different parts and could therefore also be used to
automaticaly convert different parts of a message in 7 bit groups. For
example, a message containing a uuencoded file preceded by some explanation
in ASCII and a signature at the bottom, could have a header such as:

    Encoding: 10 text, 1045 uuencode, 5 text

A smart news reader could display the two text parts and ask whether you want
the uuencode bit to be uudecoded. For an article containing a header like:

    Encoding: 15 text, 637 uugif, 5 text

the reader could then automatically extract the uuencoded .gif file and
display an image instead. Etc, etc, etc. And only users that want such
functionality switch to a news reader that supports it.

I realise that I am discussing two seperate topics here:
1) Provide 8 bit transport mechanisms so that international character sets
   can be used, but enable 8 bits only on a newsgroup by newsgroup basis
   with either a designated character set for such a group, or an Encoding
   header to indicate the character set.
2) An Encoding: header for carrying data other that text (in either 7 or
   8 bit groups).

For both I suggest to provide a standard, but not to force anybody to upgrade
to new software. I think this proposal provides for backward compatibility
and allows the requirements of a fair number of net.people.

Herman Verkade

amanda@mermaid.intercon.com (Amanda Walker) (08/03/90)

In article <900802011259.00001B1F@MARVIN.CTSS.CO.UK>, VERKADE@CTSS.CO.UK
(Herman Verkade) writes:
> The 7 bit restriction is only a problem for
> specific newsgroups: newsgroups in languages other than english and newsgroups
> containing binary data, such as bitmaps, .gif files, etc.

It's also a problem for any newsgroup that has non-english-speakers posting
to it.  For example, on alt.sca, the name of one common poster comes out
as "]ke"-- he's Swedish, and the "]" should really be an A with a ring over
it...

Even in groups whose traffic is conducted only in English, there is a growing
proportion of people whose names cannot be spelled with 7-bit ASCII.

--
Amanda Walker <amanda@intercon.com>
InterCon Systems Corporation

ed@braaten.doit.sub.org (Ed Braaten) (08/05/90)

VERKADE@CTSS.CO.UK (Herman Verkade) writes:

>A couple of comments on 8 bit news. It seems to me that it is not necesary to
>convert the whole net to 8 bit. The 7 bit restriction is only a problem for
>specific newsgroups: newsgroups in languages other than english and newsgroups
>containing binary data, such as bitmaps, .gif files, etc. So, I don't think
>**everybody** needs to upgrade to some implementation that supports 8 bits.
>Only those that wish to carry newsgroups, that need it. All we would need
>is a standard, not necesarily a world-wide upgrade of software.

I think this is the right approach to the problem.  If it works, don't
fix it! ;-)  But give the non-English and binary people a chance.  A 
standard, however is an absolute must.

>My proposal would be RFC-1154-style, because it also allows one message to
>contain encodings in different parts and could therefore also be used to
>automaticaly convert different parts of a message in 7 bit groups. For
>example, a message containing a uuencoded file preceded by some explanation
>in ASCII and a signature at the bottom, could have a header such as:

>    Encoding: 10 text, 1045 uuencode, 5 text

>A smart news reader could display the two text parts and ask whether you want
>the uuencode bit to be uudecoded. For an article containing a header like:

>    Encoding: 15 text, 637 uugif, 5 text

>the reader could then automatically extract the uuencoded .gif file and
>display an image instead. Etc, etc, etc. And only users that want such
>functionality switch to a news reader that supports it.

How about it?  Could we get the author of nn sold on this?  (I'm
crossposting this article to n.s.nn to find out...)

>I realise that I am discussing two seperate topics here:
>1) Provide 8 bit transport mechanisms so that international character sets
>   can be used, but enable 8 bits only on a newsgroup by newsgroup basis
>   with either a designated character set for such a group, or an Encoding
>   header to indicate the character set.
>2) An Encoding: header for carrying data other that text (in either 7 or
>   8 bit groups).

I like your suggestions Herman.  What about the rest of the net?
Opinions?  Comments?

Greetings from Munich,

Ed

---------------------------------------------------------------------------
        Ed Braaten             |  Jesus answered,  "I am the way and the
Work: ed@imuse.de.intel.com    |  truth and the life.  No one comes to the
Home: ed@braaten.doit.sub.org  |  Father except through me."   John 14:6 
---------------------------------------------------------------------------

mcmahon@tgv.com (John McMahon) (08/06/90)

In article <3863a@braaten.doit.sub.org>, ed@braaten.doit.sub.org (Ed Braaten) writes...
>>My proposal would be RFC-1154-style, because it also allows one message to
>>contain encodings in different parts and could therefore also be used to
>>automaticaly convert different parts of a message in 7 bit groups. For
>>example, a message containing a uuencoded file preceded by some explanation
>>in ASCII and a signature at the bottom, could have a header such as:
> 
>>    Encoding: 10 text, 1045 uuencode, 5 text

My understanding is that an RFC is in the works for "non-textual tranmission of
data via E-mail".  I suspect this could be easily expanded to include USENET
NEWS.

Watch the NIC for announcements of new RFCs...

John 'Fast-Eddie' McMahon    :    MCMAHON@TGV.COM    : TTTTTTTTTTTTTTTTTTTTTTTT
TGV, Incorporated            :                       :    T   GGGGGGG  V     V
603 Mission Street           : HAVK (abha) Gur bayl  :    T  G          V   V
Santa Cruz, California 95060 : bcrengvat flfgrz gb   :    T  G    GGGG   V V
408-427-4366 or 800-TGV-3440 : or qrfgeblrq ol znvy  :    T   GGGGGGG     V