[news.software.b] comp.datasets Call For Discussion

gnu@hoptoad.uucp (John Gilmore) (01/07/89)

(Referring to a standard for sending databases over the net):
> =- Format can be ASCII or UUENCODED binary.

We have two new versions of netnews coming out within the month -- C
news and TMNN (News 3.0).  I believe both of them are 8-bit-clean, that
is, the data part of a message can have any 8-bit characters in it
(including nulls and very long lines).  Most of this support is
necessary for "un-American" :-) character set support anyway.  If this is
really true, we should consider simply sending binaries as binaries.

It might be good to add a Format: header field with standard values
(e.g. shar, tar, cpio, arc, msdos binary, vfont, ...) which could be
used by the news reading program to determine how to display and/or
extract the data.  (Actually it probably helps to be able to include
some text, with named binary "attachments".  Then the user interface can
show the text and the list of names, and offer to extract them into
the local filesystem.)  The default Format: would be ISO-Latin-1 text
(ASCII with European characters in the top 128 positions).

We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
and other sites that netnews gateways to, but certainly we can start
with real binaries in a comp.data newsgroup (or alt.data), and folks whose
software can't handle it should simply not receive the newsgroup.  If
it works out, it can be expanded to more newsgroups or to the whole net.

We are close to the point where we can rely on there being a full 8-bit
data path.  It's time to say "folks who don't provide 8 bits must
encode 8-bit data while in transit" rather than "we'll all live with 7
bits forever and uuencode everything".  Just a leetle push now will save
us a *lot* of trouble.
-- 
John Gilmore    {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu    gnu@toad.com
Love your country but never trust its government.
		     -- from a hand-painted road sign in central Pennsylvania

len@netsys1.netsys.COM (Len Rose) (01/08/89)

In article <6182@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
# We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
# and other sites that netnews gateways to, but certainly we can start
# with real binaries in a comp.data newsgroup (or alt.data), and folks whose
# software can't handle it should simply not receive the newsgroup.  If
# it works out, it can be expanded to more newsgroups or to the whole net.

 The problem is that certain modem pools,ISN's,and the like simply cannot
handle 8 bit data.. All I ask is that there still be some vehicle to allow
encoding into 7 bit ascii.

# We are close to the point where we can rely on there being a full 8-bit
# data path.  It's time to say "folks who don't provide 8 bits must
# encode 8-bit data while in transit" rather than "we'll all live with 7
# bits forever and uuencode everything".  Just a leetle push now will save
# us a *lot* of trouble.

 Thus by one stroke you knock off many sites that depend on 7 bit data
transfers. The argument that the folks who rely on 7 bit data transfer
schemes,can adapt is or stop receiving news is unrealistic and unfair.

 Surely we can add provisions for 7 bit encoding so that less fortunate
 sites can still be a part of the community.

gnu@hoptoad.uucp (John Gilmore) (01/09/89)

I wrote:
> #             It's time to say "folks who don't provide 8 bits must
> # encode 8-bit data while in transit" rather than "we'll all live with 7
> # bits forever and uuencode everything".

len@netsys1.netsys.COM (Len Rose) wrote:
>            The argument that the folks who rely on 7 bit data transfer
> schemes,can adapt is or stop receiving news is unrealistic and unfair.
>  Surely we can add provisions for 7 bit encoding so that less fortunate
>  sites can still be a part of the community.

People seem to be misreading my paragraph above, so let me try again.

=> "Folks who don't provide 8 bits must encode 8-bit data while in transit" <=

It doesn't say that "less fortunate" sites must drop off the net, it
says they must encode data!  Currently, *everyone* has to encode data
when they post it, so this can't be a substantial burden.

My impression is that there are not so many "7-bit sites" as there are
"7-bit transport media".  (I could be wrong.)  The point is that if you
send or receive netnews via a 7-bit transport medium, you have to
encode it so that all 8-bit values come through transparently.  This
saves everyone else on the net from having to encode everything to 7
bits, which wastes transmission time, disk space, and people time.
Many BITNET sites are encoding their news transmissions already, to get
around the horrible things their network does to data.  Certainly we
can provide standardized 7-bit encoding programs, like c7sendbatch and
c7unbatch.

Is that an intolerable burden on sites with 7-bit links?  It does fall
under "adapt or stop receiving news", but the adaptation is pretty painless,
and initially it would only apply to a small number of newsgroups.
-- 
John Gilmore    {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu    gnu@toad.com
Love your country but never trust its government.
		     -- from a hand-painted road sign in central Pennsylvania

cfe+@andrew.cmu.edu (Craig F. Everhart) (01/10/89)

Consider the Content-type: header, defined in rfc1049, rather than Format:.

Consider that 7-bit (only) data paths are given via rfc822, to which rfc1036 
(the Usenet message standard) gives supremacy.

david@ms.uky.edu (David Herron -- One of the vertebrae) (01/11/89)

IBM mainframes certainly count as 7-bit sites.  I have one as a news
neighbor ... psuvm.bitnet.  For my other BITNET neighbor, a vax/vms
cluster at U of L we *do* encode (for the technical minded, it's
something like '(echo "#! bitbatch"; compress -c ${file} | btoa))'),
and things work fine.  But that IBM mainframe ... it's a whole 'nother
ball of wax alltogether.

In my personal knowledge there's 4 such machines on Usenet.  There
could easily be others, our mainframe people are considering it for
instance.  There could easily be others that I don't know about too.




Beyond that considerations ... for a loooooong time the news standard
has been to be as much like mail as possible.  In fact I use that fact
rather heavily, whenever I see something I like and need on my home computer
I simply mail it there ... MMDF has a nice program to make this simple
as well, I just do "s|resend david@davids" and off it goes.

With binary gunk in news articles I'll no longer be able to do that.
Yeah, I could write a shell script to wrap that up in a nice neat
package ...




Some of us have a dream of making a "WorldNet" .. a world-wide discussion
system derived from the current Usenet.  While it would be nice to have
binary files shipped around in binary, there is a reality that we must
pay attention to.  There are *BACKWARD* operating systems in use out
there people!  And they don't all use ASCII, and even the ones that use
ASCII don't all have a way of mixing straight text with binary gunk.

If we want to have this WorldNet thing we've gotta pay attention to
them folks.  It's mighty unfriendly to snear down our noses at them
and say they've got an ugly operating system!  There's many ugly things
about Unix as well and we got no more right to call their system
ugly as they have to say ours is.
-- 
<-- David Herron; an MMDF guy                              <david@ms.uky.edu>
<-- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<-- Now I know how Zonker felt when he graduated ...
<--          Stop!  Wait!  I didn't mean to!

leonard@qiclab.UUCP (Leonard Erickson) (01/12/89)

In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
<Some of us have a dream of making a "WorldNet" .. a world-wide discussion
<system derived from the current Usenet.  While it would be nice to have
<binary files shipped around in binary, there is a reality that we must
<pay attention to.  There are *BACKWARD* operating systems in use out
<there people!  And they don't all use ASCII, and even the ones that use
<ASCII don't all have a way of mixing straight text with binary gunk.
<
<If we want to have this WorldNet thing we've gotta pay attention to
<them folks.  It's mighty unfriendly to snear down our noses at them
<and say they've got an ugly operating system!  There's many ugly things
<about Unix as well and we got no more right to call their system
<ugly as they have to say ours is.

Do note that "worldnet" will *have* to support 8-bit characters! The 
Europeans are getting a bit tired of having their languages shoehorned
into ASCII (*American* Standard Code for Information Interchange).

Once you add 8-bit support, the only problem with binary data would be
line length and indicating end-of-file. Considering how badly things
get munged from time to time, a *byte* count AND a checksum/CRC/???
would be an asset anyway. 

It sooner or later it will be *necessary* to enforce a "thou shalt not
muck up data in transit through thy site". I'd think that this could be
handled by the news-bundling/unbundling software. What you do with it
*locally* is up to you, just so what you pass on matches what you got!

8-bit mail is going to have to happen too, and for the same reasons. 
Get used to ISO-Latin-1 (or whatever they come up with for *the*
standard). 
-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."

len@netsys.COM (Len Rose) (01/13/89)

If I understand correctly, then sites that currently batch 7 bits
won't be affected since they will just encode the binary files anyway.
As long as encoding mechanisms are left in place then I think it's a
wonderful idea.
-- 
len@netsys.com
{ames,att,rutgers}!netsys!len

dww@stl.stc.co.uk (David Wright) (01/19/89)

In article <1947@qiclab.UUCP> leonard@qiclab.UUCP (Leonard Erickson) writes:
#In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
##Some of us have a dream of making a "WorldNet" .. a world-wide discussion
I thought that's what we had now?   Of course not fully world-wide yet, the
Soviet block and most of Africa and S. America is missing, but hopefully they
will come in due time (maybe soon for the first).

#Once you add 8-bit support, the only problem with binary data would be
#line length and indicating end-of-file. 

No it wouldn't.   We can ship 8-bit data now, but remember that news articles
GET DISPLAYED.   If we had a new version of news reader that could, ODA-like,
distinguish different object types and know which to display (the note saying
what this binary data was) and which to save in a file but NOT push onto the
screen, all might be well.   But we don't, and even if you have just written
one, it will be a long time before must of us are running it.   Surely most
people on the net have experienced accidentally displaying a binary file 
(e.g.  an executable binary)?   For most terminals, the result is a terminal
in some odd mode, which has to be reset in order to display the basic (or
even extended :-)) character set.     If binary data is sent as a news article,
it ought to be coded in some way that won't do this - e.g. the present btoa
or encode schemes, or some new improved one if you like.   8 bit characer sets
are no going to solve this problem either, they are still going to include
control characters and control sequences that do strange things, probably
more than now (e.g. my VT330 gets very upset if I forget to put it in 7-bit
VT100 mode before connecting over an X.25 link that sets character parity in
the 8th bit - in 8-bit mode those (7 bit + parity) characters mean something,
often something very odd).
-- 
Regards,       "Are you sure YOUR password won't appear in RTM's next list?"
        David Wright           STL, London Road, Harlow, Essex  CM17 9NA, UK
dww@stl.stc.co.uk <or> ...uunet!mcvax!ukc!stl!dww <or> PSI%234237100122::DWW