kibo@pawl.rpi.edu (James 'Kibo' Parry) (07/21/90)
In article <15688@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes: >In article <777@hades.ausonics.oz.au> greyham@hades.ausonics.oz.au (Greyham Stoney) writes: >>Why don't all you people divert your energies into making your news system >>handle 8 bit news rather than developing new and incompatible ways of >>bitbashing your files into a format that both news and your unpacking program >>(be it /bin/sh, sed, awk or whatever) can cope with?. > >8 bit news would help only slightly with things OTHER than the transmission >of binary files via news. Seven bit is basically doing the job now; >the remaining issues (envelope consistency, line lengths, character sets, >paragraph wrapping etc) aren't going to be solved by going to 8 bits. Going to eight bits WOULD be nice for people using languages other than English; as it is now, if you're in, say, Finland, and you have a terminal that does the Finnish variant of ASCII, people outside Finland are going to see braces, brackets, backslashes, etc., wherever you say something with an accented character. Hoever, there are some good 8-bit character sets (IBM PC's, HP's, ECMA-94 Latin, etc.) which could be used so that when someone in Germany types an "o" with an umlaut, in France it'll appear as an "o" with an umlaut and not as a question mark or a bracket or something. Each foreign character could have exactly one respresentation, as opposed to the current scheme where some systems use ASCII, some use the Swedish variant, etc... Doing this would probably be best handled by (a) picking a standard (let's say we decide that non-English characters will be located in the ECMA character set.) and then (b) we either give everyone in the world a terminal that can display them (which seems very unfeasible) or else we just build into the next versions of the news-reading software a little option that maps the plus-128 characters onto the PC, HP, ECMA, ASCII, whatever character set as it displays articles. I'm sure someone will be able to poke holes in this idea, but it seems like something we should at least consider, given that the United States no longer accounts for as much of the Usenet readership as it used to. Comments? -- james "kibo" parry, 138 birch lane, scotia, ny 12302 <-- close to schenectady. kibo@pawl.rpi.edu _________________________________________________ kibo%pawl.rpi.edu@rpi.edu / Kibology / Anything I say is my opinion, userfe0n@rpitsmts.bitnet / is better! / and is the opposite of Xibo's.
eps@toaster.SFSU.EDU (Eric P. Scott) (07/21/90)
We have an 8-bit standard: ISO 8859. Great for us Western European-American types. It doesn't help the fj newsgroups much. (or kremvax!gorby) -=EPS=-
Dan@dna.lth.se (Dan Oscarsson) (07/21/90)
In article <+7Y$AV&@rpi.edu> kibo@pawl.rpi.edu (James 'Kibo' Parry) writes: >In article <15688@bfmny0.BFM.COM> tneff@bfmny0.BFM.COM (Tom Neff) writes: >>In article <777@hades.ausonics.oz.au> greyham@hades.ausonics.oz.au (Greyham Stoney) writes: >>>Why don't all you people divert your energies into making your news system >>>handle 8 bit news rather than developing new and incompatible ways of >>>bitbashing your files into a format that both news and your unpacking program >>>(be it /bin/sh, sed, awk or whatever) can cope with?. >> >>8 bit news would help only slightly with things OTHER than the transmission >>of binary files via news. Seven bit is basically doing the job now; >>the remaining issues (envelope consistency, line lengths, character sets, >>paragraph wrapping etc) aren't going to be solved by going to 8 bits. > >Going to eight bits WOULD be nice for people using languages other than >English; as it is now, if you're in, say, Finland, and you have a terminal >that does the Finnish variant of ASCII, people outside Finland are going >to see braces, brackets, backslashes, etc., wherever you say something >with an accented character. > Yes it is time to start thinking about using an international character set in netnews. This means that 8bit bytes are used but not that binary files can be transmitted without encapsulation. Binary files must still be converted into a encoded format that can be check and unpacked in a controlled manner. Only one character set should be used for transmitting articles as it is impossible for everyone to handle all in the world. In european talks about a character set to use for mail ISO 10646 is the best candidate and it should be fine for netnews also. ISO 10646 has both ASCII and ISO 8859-1 as true subsets and that will easy compatability problems. Local netnews readers will have to convert from ISO 10646 into the character set used locally. Using ISO 10646 allows nearly every letter in the world to be written. Dan -- Dan Oscarsson Department of Computer Science Lund Institute of Technology e-mail: Dan@dna.lth.se Box 118 S-221 00 Lund, Sweden
guy@auspex.auspex.com (Guy Harris) (07/22/90)
>We have an 8-bit standard: ISO 8859. Great for us Western >European-American types. It doesn't help the fj newsgroups much. >(or kremvax!gorby) ISO 8859 doesn't help the "fj" newsgroups much, but "kremvax!gorby" could use ISO 8859/5, Latin-Cyrillic alphabet. Did you perhaps mean "ISO 8859/1" rather than "ISO 8859"?
frisk@rhi.hi.is (Fridrik Skulason) (07/22/90)
Well - some of us have 8-bits news already - I am for example using an 8-bit 'rn' right now. The program only required a few minor modifications to work properly. The reason we went to 8-bit news and E-mail is quite simple - our alphabet contains 10 charactes not found in standard ASCII. Of course I can only post 8-bit articles to our local newsgroups - the rest of the world is still only 7-bit :-( I fully agree that we need an 8-bit news system (as well as 8-bit E-mail), as this would make life a lot easier for those of us not using English. Modifying the news software to permit the transmission of 8-bit data is trivial - the real problem is the charcter set issue. I don't know if the readers of this group are familiar with a similar discussion regarding automatic translation between character sets in the Kermit program. The conclusions reached there seem to apply to the 8-bit News/E-mail discussion as well, though. Some possible solutions: (1) Each machine posts articles using the user's character set of choice. To indicate which character set is used, a new field is added to the header. examples: Character-set: CP 870 Character-set: ISO 8859/4 This is easy to implement, but has one serious drawback - all machines are required to be able to handle all possible character sets. (2) On every machine the article is translated into one of the ISO 8859/x series of character sets. 8859/1 would probably be most used, as it covers most of the languages of Western Europe. 8859/2, 8859/3, 8859/4 etc. would solve the needs of those using Greek, various Eastern European languages and (I think) Hebrew and Arabic. This would not solve the problem of those using a 16-bit character set. Also, I am not sure if Esperanto is included in any of the ISO 8859/x standards. (3) All text is transmitted according to the ISO 10646 standard. This has one advantage compared to (2) - it allows the transmission of documents containing 16-bit characters, as well as documents containing characters from more than one of the 8859/x standards. For example, one could send a message with the first part in Russian and the second part in Greek. My opinion is that (3) is more of a long-term goal - for 95 % of users of Usenet, (2) is all that is needed. But what changes would (2) require ? Change #1: Any ASCII computer on Usenet must accept 8-bit news and E-mail, and be able to forward articles without changes (in other words - don't strip the eight bit !!!) This is the only change required from the "English-only" ASCII-sites, where no 8-bit articles would originate or be read. Change #2: Any computer on Usenet using an extended version of ASCII (CP 437, ISO 8859/x etc) must translate all postings to one of the 8859/x charcter sets and indicate (in the header) which one is used. This change would be required from European/Non-English using users. Change #3: Any computer not using ASCII, but rather EBDIC (or something else), must translate all postings to one of the 8859/x character sets, instead of just translating to ASCII. Change #4: Any computer must accept postings in one of the 8859/x character sets and be able to translate them to the character set used by each user. Problem #1: If the local character set is not able to represent all the charactes in the original posting, they must be represented as well as possible. For example - a 7-bit computer receiving a text containing accented wovels might be expected just to drop the accent marks. Problem #2: Different users - even on the same machine - have different capabilities to display 8-bit text. For example, in Scandinavia it is common for terminals to use a 7-bit character set, where some of the characters (for example { [ ] } |) have been replaced by non-ASCII characters. Other users in the same countries have fully 8-bit terminals (for example PCs running an terminal emulator). The computer must store incoming articles as they arrive and the news/E-mail software must be updated to display them according to the capabilities of each terminal, as indicated by an environment variable. So - what now ? Is there any interest in creating a "working group" to attack the problem ? Any of the authors of rn, nn, elm or other news/e-mail software out there ? We are of course willing to share our modifications to the programs, and with a bit of work we should be able to have 8-bit news/email running in a few months. So - any volunteers ? -- Fridrik Skulason University of Iceland | Technical Editor of the Virus Bulletin (UK) | Reserved for future expansion E-Mail: frisk@rhi.hi.is Fax: 354-1-28801 |
ed@braaten.doit.sub.org (Ed Braaten) (07/24/90)
scs@lokkur.dexter.mi.us (Steve Simmons) writes: >Hey, news is ASCII-based, written in english-speaking countries for >english-speaking readers. That fact that it works *at all* for >international and non-English stuff is a wonderful plus. If regional >newgroups have regional needs, they should go ahead and fill them. Thats funny Steve - 50% of the news I read is in German. I'm living in Germany right now. Although many of the 60+ Million Germans here can speak English (often better than we americans ;-), the language in this country is German. And the german language news here is not limited to "regional" consumption. I'm aware of several sites there in the good ole USA that are carrying the german groups also. I'm willing to bet there is a LOT of non-English stuff floating around out there. So why don't we drop the provincial attitudes - lets hear it for 8-bit news! It won't make English any harder to read, but it would certainly make life easier for the rest of the USENET. >But neither side should expect interoperatbility. Say what? Interoperability and the free exchange of information is in my opinion exactly what makes USENET so successful... --------------------------------------------------------------------------- Ed Braaten | Jesus answered, "I am the way and the Work: ed@imuse.de.intel.com | truth and the life. No one comes to the Home: ed@braaten.doit.sub.org | Father except through me." John 14:6 ---------------------------------------------------------------------------
VERKADE@CTSS.CO.UK (Herman Verkade) (08/02/90)
A couple of comments on 8 bit news. It seems to me that it is not necesary to convert the whole net to 8 bit. The 7 bit restriction is only a problem for specific newsgroups: newsgroups in languages other than english and newsgroups containing binary data, such as bitmaps, .gif files, etc. So, I don't think **everybody** needs to upgrade to some implementation that supports 8 bits. Only those that wish to carry newsgroups, that need it. All we would need is a standard, not necesarily a world-wide upgrade of software. For example, if the Germans and the Fins decide that their local language newsgroups will be 8 bit, then that is of no business of the Americans, British, Spanish or Russians. As long as there is some software that support 8 bits (And C News seems to do that or alternatively a few changes in B News) and maybe some way of indicating that a particular group expects 8-bit, so that when posting to a 7 bit group another signature can be used; one that doesn't have 8 bit characters in it. And if people want to start a new newsgroup for .gif files in 8 bit mode then, again, the sites that want to carry it must install an 8 bit version. If your feed doesn't support it, get another feed for that newsgroup (A similar situation exists in the UK, where UKC doesn't carry nor forward `alt.sex.pictures'. Sites that want it get another feed for that group). The next problem would be how to read an 8 bit group, both for groups that use different character sets and for groups that carry other 8 bit data. As someone earlier suggested, maybe an extra header should be added to the standard that indicates what type of data it is. For mail there are RFC-1049 (Content-Type) and RFC-1154 (Encoding), which are extensions to RFC-822. The extra header fields would only need to be interpreted by the news reader. So, only if you want to read an 8 bit group, get an 8 bit reader. As long as we can agree on some standard. My proposal would be RFC-1154-style, because it also allows one message to contain encodings in different parts and could therefore also be used to automaticaly convert different parts of a message in 7 bit groups. For example, a message containing a uuencoded file preceded by some explanation in ASCII and a signature at the bottom, could have a header such as: Encoding: 10 text, 1045 uuencode, 5 text A smart news reader could display the two text parts and ask whether you want the uuencode bit to be uudecoded. For an article containing a header like: Encoding: 15 text, 637 uugif, 5 text the reader could then automatically extract the uuencoded .gif file and display an image instead. Etc, etc, etc. And only users that want such functionality switch to a news reader that supports it. I realise that I am discussing two seperate topics here: 1) Provide 8 bit transport mechanisms so that international character sets can be used, but enable 8 bits only on a newsgroup by newsgroup basis with either a designated character set for such a group, or an Encoding header to indicate the character set. 2) An Encoding: header for carrying data other that text (in either 7 or 8 bit groups). For both I suggest to provide a standard, but not to force anybody to upgrade to new software. I think this proposal provides for backward compatibility and allows the requirements of a fair number of net.people. Herman Verkade
amanda@mermaid.intercon.com (Amanda Walker) (08/03/90)
In article <900802011259.00001B1F@MARVIN.CTSS.CO.UK>, VERKADE@CTSS.CO.UK (Herman Verkade) writes: > The 7 bit restriction is only a problem for > specific newsgroups: newsgroups in languages other than english and newsgroups > containing binary data, such as bitmaps, .gif files, etc. It's also a problem for any newsgroup that has non-english-speakers posting to it. For example, on alt.sca, the name of one common poster comes out as "]ke"-- he's Swedish, and the "]" should really be an A with a ring over it... Even in groups whose traffic is conducted only in English, there is a growing proportion of people whose names cannot be spelled with 7-bit ASCII. -- Amanda Walker <amanda@intercon.com> InterCon Systems Corporation
ed@braaten.doit.sub.org (Ed Braaten) (08/05/90)
VERKADE@CTSS.CO.UK (Herman Verkade) writes: >A couple of comments on 8 bit news. It seems to me that it is not necesary to >convert the whole net to 8 bit. The 7 bit restriction is only a problem for >specific newsgroups: newsgroups in languages other than english and newsgroups >containing binary data, such as bitmaps, .gif files, etc. So, I don't think >**everybody** needs to upgrade to some implementation that supports 8 bits. >Only those that wish to carry newsgroups, that need it. All we would need >is a standard, not necesarily a world-wide upgrade of software. I think this is the right approach to the problem. If it works, don't fix it! ;-) But give the non-English and binary people a chance. A standard, however is an absolute must. >My proposal would be RFC-1154-style, because it also allows one message to >contain encodings in different parts and could therefore also be used to >automaticaly convert different parts of a message in 7 bit groups. For >example, a message containing a uuencoded file preceded by some explanation >in ASCII and a signature at the bottom, could have a header such as: > Encoding: 10 text, 1045 uuencode, 5 text >A smart news reader could display the two text parts and ask whether you want >the uuencode bit to be uudecoded. For an article containing a header like: > Encoding: 15 text, 637 uugif, 5 text >the reader could then automatically extract the uuencoded .gif file and >display an image instead. Etc, etc, etc. And only users that want such >functionality switch to a news reader that supports it. How about it? Could we get the author of nn sold on this? (I'm crossposting this article to n.s.nn to find out...) >I realise that I am discussing two seperate topics here: >1) Provide 8 bit transport mechanisms so that international character sets > can be used, but enable 8 bits only on a newsgroup by newsgroup basis > with either a designated character set for such a group, or an Encoding > header to indicate the character set. >2) An Encoding: header for carrying data other that text (in either 7 or > 8 bit groups). I like your suggestions Herman. What about the rest of the net? Opinions? Comments? Greetings from Munich, Ed --------------------------------------------------------------------------- Ed Braaten | Jesus answered, "I am the way and the Work: ed@imuse.de.intel.com | truth and the life. No one comes to the Home: ed@braaten.doit.sub.org | Father except through me." John 14:6 ---------------------------------------------------------------------------
mcmahon@tgv.com (John McMahon) (08/06/90)
In article <3863a@braaten.doit.sub.org>, ed@braaten.doit.sub.org (Ed Braaten) writes... >>My proposal would be RFC-1154-style, because it also allows one message to >>contain encodings in different parts and could therefore also be used to >>automaticaly convert different parts of a message in 7 bit groups. For >>example, a message containing a uuencoded file preceded by some explanation >>in ASCII and a signature at the bottom, could have a header such as: > >> Encoding: 10 text, 1045 uuencode, 5 text My understanding is that an RFC is in the works for "non-textual tranmission of data via E-mail". I suspect this could be easily expanded to include USENET NEWS. Watch the NIC for announcements of new RFCs... John 'Fast-Eddie' McMahon : MCMAHON@TGV.COM : TTTTTTTTTTTTTTTTTTTTTTTT TGV, Incorporated : : T GGGGGGG V V 603 Mission Street : HAVK (abha) Gur bayl : T G V V Santa Cruz, California 95060 : bcrengvat flfgrz gb : T G GGGG V V 408-427-4366 or 800-TGV-3440 : or qrfgeblrq ol znvy : T GGGGGGG V