snoopy@sopwith.UUCP (Snoopy) (12/22/88)
In article <2708@datapg.MN.ORG> sewilco@datapg.MN.ORG (Scot E. Wilcoxon) writes: | I much prefer to have clean data or source in the "sources" groups, | where I can easily weed out what is not of interest, and where it will | probably be saved in other archive sites. Useful data or source in | discussion groups (often in comp.arch and comp.graphics) are easily lost. It appears that the number and size of "datasets" posted to the net is growing. Time to consider a group such as comp.datasets to hold them. I suggest it be moderated in the manner of comp.sources.unix or c.s.misc, to keep the discussion and such out, and to provide flow control. Putting data in a source group isn't *toooo* bad, but it really is a different animal. The volume of these datasets is sometimes large, and putting them in a seperate newsgroup would allow those sites that aren't interested to have their newsfeed turn the group off. Also, some sites may be interested in receiving the datasets, but may not wish to automatically archive them. If they are mixed in with the source groups it is difficult to archive the sources without also archiving the datasets. (or vice versa) Therefore I ask the following questions: (1) Do we want a newsgroup for data? (2) Do we want it to be moderated? (3) If so, any volunteers for moderator? (4) Can someone think of a better name than "comp.datasets"? Sounds like something out of an IBM shop. :-( Perhaps "comp.data" ? _____ /_____\ Snoopy /_______\ |___| tektronix!tekecs!sopwith!snoopy |___| sun!nosun!illian!sopwith!snoopy
jpn@genrad.com (John P. Nelson) (12/27/88)
In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes: >It appears that the number and size of "datasets" posted to the net is >growing. Time to consider a group such as comp.datasets to hold them. >I suggest it be moderated in the manner of comp.sources.unix or c.s.misc, Point of order: What EXACTLY is a "dataset"? There was a discussion recently about starting a group for GIF format graphic pictures: would this group cover these as well as star catalogs? I mean, these are clearly data, and they are large (well, at least in collection). The GIF group was shouted down (I don't think it ever actually came to a vote) because people felt that shipping pictures around was a waste of the net's bandwidth, and besides: there are better forums for such things. Not that I necessarily AGREE with this argument - but it applies equally well to star catalogs, font data, and graphical pictures. john nelson UUCP: {decvax,mit-eddie}!genrad!teddy!jpn smail: jpn@teddy.genrad.com
car@pte.UUCP (Chris Rende) (12/29/88)
In article <15054@genrad.UUCP>, jpn@genrad.com (John P. Nelson) writes: > In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes: > >It appears that the number and size of "datasets" posted to the net is > >growing. Time to consider a group such as comp.datasets to hold them. > > Point of order: What EXACTLY is a "dataset"? There was a discussion > recently about starting a group for GIF format graphic pictures: would > this group cover these as well as star catalogs? I mean, these are > clearly data, and they are large (well, at least in collection). I think that a group for data postings is a good idea. I'd like to see more data be posted: Astronomical data, medical, graphics, geographical maps, census, etc... What is a dataset? I can just see a month's worth of philosophical flaming in the wings... :-) Especially if you're talking about LISP stuff... (((:-))) It would probably be best to describe the desired contents of the group rather than try to define data in a manner which agrees with all netters. To start the furnace, here are some thoughts that I have about a comp.data newsgroup: - Used to post machine readable information that is NOT program source code. [I say "NOT program source code" because we have source code groups and because LISP code looks like LISP data.] - Format can be ASCII or UUENCODED binary. - Said data should NOT be binary executables. There are groups for this as well. - Although I'm not quite sure of the posting mechanism, I think that some description of the structure of the data should be included with the data. This could be source code segments (like .h files), a column by column description, or just plain text. [comp.data.d?] - Said data should be machine independent. I.e., a UUENCODED binary of a MSDOS directory is not generally usefull. car. -- Christopher A. Rende Multics,DTSS,Shortwave,Scanners,StarTrek uunet!{umix,edsews}!rphroy!pte!car Minix,PC/XT,TRS-80 Model I: Buy Sell Trade Motorola VME1131 M68020 SVR2 Precise Technology & Electronics, Inc.
jbayer@ispi.UUCP (Jonathan Bayer) (01/05/89)
In article <360@pte.UUCP> car@pte.UUCP (Chris Rende) writes: >In article <15054@genrad.UUCP>, jpn@genrad.com (John P. Nelson) writes: >> In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes: == =It appears that the number and size of "datasets" posted to the net is == =growing. Time to consider a group such as comp.datasets to hold them. == == Point of order: What EXACTLY is a "dataset"? There was a discussion == recently about starting a group for GIF format graphic pictures: would == this group cover these as well as star catalogs? I mean, these are == clearly data, and they are large (well, at least in collection). = =I think that a group for data postings is a good idea. I'd like to see more =data be posted: Astronomical data, medical, graphics, geographical maps, =census, etc... = =What is a dataset? I can just see a month's worth of philosophical flaming =in the wings... :-) Especially if you're talking about LISP stuff... (((:-))) =It would probably be best to describe the desired contents of the group =rather than try to define data in a manner which agrees with all netters. = =To start the furnace, here are some thoughts that I have about a comp.data =newsgroup: = =- Used to post machine readable information that is NOT program source code. = [I say "NOT program source code" because we have source code groups and = because LISP code looks like LISP data.] =- Format can be ASCII or UUENCODED binary. =- Said data should NOT be binary executables. There are groups for this as well. =- Although I'm not quite sure of the posting mechanism, I think that some = description of the structure of the data should be included with the data. = This could be source code segments (like .h files), a column by column = description, or just plain text. [comp.data.d?] =- Said data should be machine independent. I.e., a UUENCODED binary of a = MSDOS directory is not generally usefull. = This sounds like a good idea. This way it would be possible for small sites with little or no interest in most of the datasets would not get them. However, I do have a suggestion. It makes sense to archive these datasets at several large usenet sites similar to the way that comp.sources.unix (and others) is currently archived. There should be a notice published in a different newsgroup from (suggested) comp.sources.data which would let everyone know about the published datasets. This would let those sites that need/want it to get it from a major site. This suggestion does have a problem. If there are some small sites which are several nodes downstream of an archive site, and each one requests a major dataset, then the mail load on the intermediate sites will go up tremendously. However, this might be preferable to always carrying the datasets. -- Jonathan Bayer "The time has come," the Walrus said... Intelligent Software Products, Inc. 19 Virginia Ave. ...uunet!ispi!jbayer Rockville Centre, NY 11570 (516) 766-2867 jbayer@ispi
gnu@hoptoad.uucp (John Gilmore) (01/07/89)
(Referring to a standard for sending databases over the net):
> =- Format can be ASCII or UUENCODED binary.
We have two new versions of netnews coming out within the month -- C
news and TMNN (News 3.0). I believe both of them are 8-bit-clean, that
is, the data part of a message can have any 8-bit characters in it
(including nulls and very long lines). Most of this support is
necessary for "un-American" :-) character set support anyway. If this is
really true, we should consider simply sending binaries as binaries.
It might be good to add a Format: header field with standard values
(e.g. shar, tar, cpio, arc, msdos binary, vfont, ...) which could be
used by the news reading program to determine how to display and/or
extract the data. (Actually it probably helps to be able to include
some text, with named binary "attachments". Then the user interface can
show the text and the list of names, and offer to extract them into
the local filesystem.) The default Format: would be ISO-Latin-1 text
(ASCII with European characters in the top 128 positions).
We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
and other sites that netnews gateways to, but certainly we can start
with real binaries in a comp.data newsgroup (or alt.data), and folks whose
software can't handle it should simply not receive the newsgroup. If
it works out, it can be expanded to more newsgroups or to the whole net.
We are close to the point where we can rely on there being a full 8-bit
data path. It's time to say "folks who don't provide 8 bits must
encode 8-bit data while in transit" rather than "we'll all live with 7
bits forever and uuencode everything". Just a leetle push now will save
us a *lot* of trouble.
--
John Gilmore {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu gnu@toad.com
Love your country but never trust its government.
-- from a hand-painted road sign in central Pennsylvania
len@netsys1.netsys.COM (Len Rose) (01/08/89)
In article <6182@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
# We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
# and other sites that netnews gateways to, but certainly we can start
# with real binaries in a comp.data newsgroup (or alt.data), and folks whose
# software can't handle it should simply not receive the newsgroup. If
# it works out, it can be expanded to more newsgroups or to the whole net.
The problem is that certain modem pools,ISN's,and the like simply cannot
handle 8 bit data.. All I ask is that there still be some vehicle to allow
encoding into 7 bit ascii.
# We are close to the point where we can rely on there being a full 8-bit
# data path. It's time to say "folks who don't provide 8 bits must
# encode 8-bit data while in transit" rather than "we'll all live with 7
# bits forever and uuencode everything". Just a leetle push now will save
# us a *lot* of trouble.
Thus by one stroke you knock off many sites that depend on 7 bit data
transfers. The argument that the folks who rely on 7 bit data transfer
schemes,can adapt is or stop receiving news is unrealistic and unfair.
Surely we can add provisions for 7 bit encoding so that less fortunate
sites can still be a part of the community.
gnu@hoptoad.uucp (John Gilmore) (01/09/89)
I wrote: > # It's time to say "folks who don't provide 8 bits must > # encode 8-bit data while in transit" rather than "we'll all live with 7 > # bits forever and uuencode everything". len@netsys1.netsys.COM (Len Rose) wrote: > The argument that the folks who rely on 7 bit data transfer > schemes,can adapt is or stop receiving news is unrealistic and unfair. > Surely we can add provisions for 7 bit encoding so that less fortunate > sites can still be a part of the community. People seem to be misreading my paragraph above, so let me try again. => "Folks who don't provide 8 bits must encode 8-bit data while in transit" <= It doesn't say that "less fortunate" sites must drop off the net, it says they must encode data! Currently, *everyone* has to encode data when they post it, so this can't be a substantial burden. My impression is that there are not so many "7-bit sites" as there are "7-bit transport media". (I could be wrong.) The point is that if you send or receive netnews via a 7-bit transport medium, you have to encode it so that all 8-bit values come through transparently. This saves everyone else on the net from having to encode everything to 7 bits, which wastes transmission time, disk space, and people time. Many BITNET sites are encoding their news transmissions already, to get around the horrible things their network does to data. Certainly we can provide standardized 7-bit encoding programs, like c7sendbatch and c7unbatch. Is that an intolerable burden on sites with 7-bit links? It does fall under "adapt or stop receiving news", but the adaptation is pretty painless, and initially it would only apply to a small number of newsgroups. -- John Gilmore {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu gnu@toad.com Love your country but never trust its government. -- from a hand-painted road sign in central Pennsylvania
cfe+@andrew.cmu.edu (Craig F. Everhart) (01/10/89)
Consider the Content-type: header, defined in rfc1049, rather than Format:. Consider that 7-bit (only) data paths are given via rfc822, to which rfc1036 (the Usenet message standard) gives supremacy.
cfe+@andrew.cmu.edu (Craig F. Everhart) (01/10/89)
Consider the Content-type: header, defined in rfc1049, rather than Format:. Consider that 7-bit (only) data paths are given via rfc822, to which rfc1036 (the Usenet message standard) gives supremacy.
david@ms.uky.edu (David Herron -- One of the vertebrae) (01/11/89)
IBM mainframes certainly count as 7-bit sites. I have one as a news neighbor ... psuvm.bitnet. For my other BITNET neighbor, a vax/vms cluster at U of L we *do* encode (for the technical minded, it's something like '(echo "#! bitbatch"; compress -c ${file} | btoa))'), and things work fine. But that IBM mainframe ... it's a whole 'nother ball of wax alltogether. In my personal knowledge there's 4 such machines on Usenet. There could easily be others, our mainframe people are considering it for instance. There could easily be others that I don't know about too. Beyond that considerations ... for a loooooong time the news standard has been to be as much like mail as possible. In fact I use that fact rather heavily, whenever I see something I like and need on my home computer I simply mail it there ... MMDF has a nice program to make this simple as well, I just do "s|resend david@davids" and off it goes. With binary gunk in news articles I'll no longer be able to do that. Yeah, I could write a shell script to wrap that up in a nice neat package ... Some of us have a dream of making a "WorldNet" .. a world-wide discussion system derived from the current Usenet. While it would be nice to have binary files shipped around in binary, there is a reality that we must pay attention to. There are *BACKWARD* operating systems in use out there people! And they don't all use ASCII, and even the ones that use ASCII don't all have a way of mixing straight text with binary gunk. If we want to have this WorldNet thing we've gotta pay attention to them folks. It's mighty unfriendly to snear down our noses at them and say they've got an ugly operating system! There's many ugly things about Unix as well and we got no more right to call their system ugly as they have to say ours is. -- <-- David Herron; an MMDF guy <david@ms.uky.edu> <-- ska: David le casse\*' {rutgers,uunet}!ukma!david, david@UKMA.BITNET <-- Now I know how Zonker felt when he graduated ... <-- Stop! Wait! I didn't mean to!
leonard@qiclab.UUCP (Leonard Erickson) (01/12/89)
In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
<Some of us have a dream of making a "WorldNet" .. a world-wide discussion
<system derived from the current Usenet. While it would be nice to have
<binary files shipped around in binary, there is a reality that we must
<pay attention to. There are *BACKWARD* operating systems in use out
<there people! And they don't all use ASCII, and even the ones that use
<ASCII don't all have a way of mixing straight text with binary gunk.
<
<If we want to have this WorldNet thing we've gotta pay attention to
<them folks. It's mighty unfriendly to snear down our noses at them
<and say they've got an ugly operating system! There's many ugly things
<about Unix as well and we got no more right to call their system
<ugly as they have to say ours is.
Do note that "worldnet" will *have* to support 8-bit characters! The
Europeans are getting a bit tired of having their languages shoehorned
into ASCII (*American* Standard Code for Information Interchange).
Once you add 8-bit support, the only problem with binary data would be
line length and indicating end-of-file. Considering how badly things
get munged from time to time, a *byte* count AND a checksum/CRC/???
would be an asset anyway.
It sooner or later it will be *necessary* to enforce a "thou shalt not
muck up data in transit through thy site". I'd think that this could be
handled by the news-bundling/unbundling software. What you do with it
*locally* is up to you, just so what you pass on matches what you got!
8-bit mail is going to have to happen too, and for the same reasons.
Get used to ISO-Latin-1 (or whatever they come up with for *the*
standard).
--
Leonard Erickson ...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."
len@netsys.COM (Len Rose) (01/13/89)
If I understand correctly, then sites that currently batch 7 bits won't be affected since they will just encode the binary files anyway. As long as encoding mechanisms are left in place then I think it's a wonderful idea. -- len@netsys.com {ames,att,rutgers}!netsys!len
rick@pavlov.bcm.tmc.edu (Richard H. Miller) (01/16/89)
The idea of making datasets available to people is a good one. However, to actually post the dataset as news is a bad idea. Most of this data would be of interest only to a small minority of users. A better apporach would be to handle it like comp.archives in which articles could be posted as to the existence of datasets along with instructions as to how to retrieve the data, them those users who need a particular dataset could retrieve it and we would not have to ship the data all over. Richard H. Miller Email: rick@bcm.tmc.edu Asst. Dir. for Technical Support Voice: (713)799-4511 Baylor College of Medicine US Mail: One Baylor Plaza, 302H Houston, Texas 77030
dww@stl.stc.co.uk (David Wright) (01/19/89)
In article <1947@qiclab.UUCP> leonard@qiclab.UUCP (Leonard Erickson) writes: #In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes: ##Some of us have a dream of making a "WorldNet" .. a world-wide discussion I thought that's what we had now? Of course not fully world-wide yet, the Soviet block and most of Africa and S. America is missing, but hopefully they will come in due time (maybe soon for the first). #Once you add 8-bit support, the only problem with binary data would be #line length and indicating end-of-file. No it wouldn't. We can ship 8-bit data now, but remember that news articles GET DISPLAYED. If we had a new version of news reader that could, ODA-like, distinguish different object types and know which to display (the note saying what this binary data was) and which to save in a file but NOT push onto the screen, all might be well. But we don't, and even if you have just written one, it will be a long time before must of us are running it. Surely most people on the net have experienced accidentally displaying a binary file (e.g. an executable binary)? For most terminals, the result is a terminal in some odd mode, which has to be reset in order to display the basic (or even extended :-)) character set. If binary data is sent as a news article, it ought to be coded in some way that won't do this - e.g. the present btoa or encode schemes, or some new improved one if you like. 8 bit characer sets are no going to solve this problem either, they are still going to include control characters and control sequences that do strange things, probably more than now (e.g. my VT330 gets very upset if I forget to put it in 7-bit VT100 mode before connecting over an X.25 link that sets character parity in the 8th bit - in 8-bit mode those (7 bit + parity) characters mean something, often something very odd). -- Regards, "Are you sure YOUR password won't appear in RTM's next list?" David Wright STL, London Road, Harlow, Essex CM17 9NA, UK dww@stl.stc.co.uk <or> ...uunet!mcvax!ukc!stl!dww <or> PSI%234237100122::DWW