[comp.sources.d] comp.datasets Call For Discussion

snoopy@sopwith.UUCP (Snoopy) (12/22/88)

In article <2708@datapg.MN.ORG> sewilco@datapg.MN.ORG (Scot E. Wilcoxon) writes:

| I much prefer to have clean data or source in the "sources" groups,
| where I can easily weed out what is not of interest, and where it will
| probably be saved in other archive sites.  Useful data or source in
| discussion groups (often in comp.arch and comp.graphics) are easily lost.

It appears that the number and size of "datasets" posted to the net is
growing.  Time to consider a group such as comp.datasets to hold them.
I suggest it be moderated in the manner of comp.sources.unix or c.s.misc,
to keep the discussion and such out, and to provide flow control.  Putting
data in a source group isn't *toooo* bad, but it really is a different animal.
The volume of these datasets is sometimes large, and putting them in a
seperate newsgroup would allow those sites that aren't interested to have
their newsfeed turn the group off.  Also, some sites may be interested in
receiving the datasets, but may not wish to automatically archive them.
If they are mixed in with the source groups it is difficult to archive the
sources without also archiving the datasets.  (or vice versa)

Therefore  I ask the following questions:  (1) Do we want a newsgroup
for data?  (2) Do we want it to be moderated?  (3) If so, any volunteers
for moderator? (4) Can someone think of a better name than "comp.datasets"?
Sounds like something out of an IBM shop.  :-(   Perhaps "comp.data" ?
    _____     
   /_____\    Snoopy
  /_______\   
    |___|     tektronix!tekecs!sopwith!snoopy
    |___|     sun!nosun!illian!sopwith!snoopy

jpn@genrad.com (John P. Nelson) (12/27/88)

In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes:
>It appears that the number and size of "datasets" posted to the net is
>growing.  Time to consider a group such as comp.datasets to hold them.
>I suggest it be moderated in the manner of comp.sources.unix or c.s.misc,

    Point of order:  What EXACTLY is a "dataset"?  There was a discussion
recently about starting a group for GIF format graphic pictures:  would
this group cover these as well as star catalogs?  I mean, these are
clearly data, and they are large (well, at least in collection).

    The GIF group was shouted down (I don't think it ever actually came
to a vote) because people felt that shipping pictures around was a waste
of the net's bandwidth, and besides: there are better forums for such
things.  Not that I necessarily AGREE with this argument - but it applies
equally well to star catalogs, font data, and graphical pictures.

     john nelson

UUCP:	{decvax,mit-eddie}!genrad!teddy!jpn
smail:	jpn@teddy.genrad.com

car@pte.UUCP (Chris Rende) (12/29/88)

In article <15054@genrad.UUCP>, jpn@genrad.com (John P. Nelson) writes:
> In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes:
> >It appears that the number and size of "datasets" posted to the net is
> >growing.  Time to consider a group such as comp.datasets to hold them.
> 
>     Point of order:  What EXACTLY is a "dataset"?  There was a discussion
> recently about starting a group for GIF format graphic pictures:  would
> this group cover these as well as star catalogs?  I mean, these are
> clearly data, and they are large (well, at least in collection).

I think that a group for data postings is a good idea. I'd like to see more
data be posted: Astronomical data, medical, graphics, geographical maps,
census, etc...

What is a dataset? I can just see a month's worth of philosophical flaming
in the wings... :-) Especially if you're talking about LISP stuff... (((:-)))
It would probably be best to describe the desired contents of the group
rather than try to define data in a manner which agrees with all netters.

To start the furnace, here are some thoughts that I have about a comp.data
newsgroup:

- Used to post machine readable information that is NOT program source code.
  [I say "NOT program source code" because we have source code groups and
   because LISP code looks like LISP data.]
- Format can be ASCII or UUENCODED binary.
- Said data should NOT be binary executables. There are groups for this as well.
- Although I'm not quite sure of the posting mechanism, I think that some
  description of the structure of the data should be included with the data.
  This could be source code segments (like .h files), a column by column
  description, or just plain text. [comp.data.d?]
- Said data should be machine independent. I.e., a UUENCODED binary of a
  MSDOS directory is not generally usefull.

car.
-- 
Christopher A. Rende                Multics,DTSS,Shortwave,Scanners,StarTrek
uunet!{umix,edsews}!rphroy!pte!car  Minix,PC/XT,TRS-80 Model I: Buy Sell Trade
Motorola VME1131 M68020 SVR2        Precise Technology & Electronics, Inc.

jbayer@ispi.UUCP (Jonathan Bayer) (01/05/89)

In article <360@pte.UUCP> car@pte.UUCP (Chris Rende) writes:
>In article <15054@genrad.UUCP>, jpn@genrad.com (John P. Nelson) writes:
>> In article <76@sopwith.UUCP> snoopy@sopwith.UUCP (Snoopy) writes:
== =It appears that the number and size of "datasets" posted to the net is
== =growing.  Time to consider a group such as comp.datasets to hold them.
== 
==     Point of order:  What EXACTLY is a "dataset"?  There was a discussion
== recently about starting a group for GIF format graphic pictures:  would
== this group cover these as well as star catalogs?  I mean, these are
== clearly data, and they are large (well, at least in collection).
=
=I think that a group for data postings is a good idea. I'd like to see more
=data be posted: Astronomical data, medical, graphics, geographical maps,
=census, etc...
=
=What is a dataset? I can just see a month's worth of philosophical flaming
=in the wings... :-) Especially if you're talking about LISP stuff... (((:-)))
=It would probably be best to describe the desired contents of the group
=rather than try to define data in a manner which agrees with all netters.
=
=To start the furnace, here are some thoughts that I have about a comp.data
=newsgroup:
=
=- Used to post machine readable information that is NOT program source code.
=  [I say "NOT program source code" because we have source code groups and
=   because LISP code looks like LISP data.]
=- Format can be ASCII or UUENCODED binary.
=- Said data should NOT be binary executables. There are groups for this as well.
=- Although I'm not quite sure of the posting mechanism, I think that some
=  description of the structure of the data should be included with the data.
=  This could be source code segments (like .h files), a column by column
=  description, or just plain text. [comp.data.d?]
=- Said data should be machine independent. I.e., a UUENCODED binary of a
=  MSDOS directory is not generally usefull.
=

This sounds like a good idea.  This way it would be possible for small
sites with little or no interest in most of the datasets would not get
them.  However, I do have a suggestion.  It makes sense to archive these
datasets at several large usenet sites similar to the way that
comp.sources.unix (and others) is currently archived.  There should be a
notice published in a different newsgroup from (suggested) comp.sources.data 
which would let everyone know about the published datasets.  This would
let those sites that need/want it to get it from a major site.

This suggestion does have a problem.  If there are some small sites
which are several nodes downstream of an archive site, and each one
requests a major dataset, then the mail load on the intermediate sites
will go up tremendously.  However, this might be preferable to always
carrying the datasets.
-- 
Jonathan Bayer				"The time has come," the Walrus said...
Intelligent Software Products, Inc.	
19 Virginia Ave.				...uunet!ispi!jbayer
Rockville Centre, NY   11570	(516) 766-2867	jbayer@ispi

gnu@hoptoad.uucp (John Gilmore) (01/07/89)

(Referring to a standard for sending databases over the net):
> =- Format can be ASCII or UUENCODED binary.

We have two new versions of netnews coming out within the month -- C
news and TMNN (News 3.0).  I believe both of them are 8-bit-clean, that
is, the data part of a message can have any 8-bit characters in it
(including nulls and very long lines).  Most of this support is
necessary for "un-American" :-) character set support anyway.  If this is
really true, we should consider simply sending binaries as binaries.

It might be good to add a Format: header field with standard values
(e.g. shar, tar, cpio, arc, msdos binary, vfont, ...) which could be
used by the news reading program to determine how to display and/or
extract the data.  (Actually it probably helps to be able to include
some text, with named binary "attachments".  Then the user interface can
show the text and the list of names, and offer to extract them into
the local filesystem.)  The default Format: would be ISO-Latin-1 text
(ASCII with European characters in the top 128 positions).

We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
and other sites that netnews gateways to, but certainly we can start
with real binaries in a comp.data newsgroup (or alt.data), and folks whose
software can't handle it should simply not receive the newsgroup.  If
it works out, it can be expanded to more newsgroups or to the whole net.

We are close to the point where we can rely on there being a full 8-bit
data path.  It's time to say "folks who don't provide 8 bits must
encode 8-bit data while in transit" rather than "we'll all live with 7
bits forever and uuencode everything".  Just a leetle push now will save
us a *lot* of trouble.
-- 
John Gilmore    {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu    gnu@toad.com
Love your country but never trust its government.
		     -- from a hand-painted road sign in central Pennsylvania

len@netsys1.netsys.COM (Len Rose) (01/08/89)

In article <6182@hoptoad.uucp>, gnu@hoptoad.uucp (John Gilmore) writes:
# We'd have to determine the impact on old-news, NNTP, notes, Fidonet,
# and other sites that netnews gateways to, but certainly we can start
# with real binaries in a comp.data newsgroup (or alt.data), and folks whose
# software can't handle it should simply not receive the newsgroup.  If
# it works out, it can be expanded to more newsgroups or to the whole net.

 The problem is that certain modem pools,ISN's,and the like simply cannot
handle 8 bit data.. All I ask is that there still be some vehicle to allow
encoding into 7 bit ascii.

# We are close to the point where we can rely on there being a full 8-bit
# data path.  It's time to say "folks who don't provide 8 bits must
# encode 8-bit data while in transit" rather than "we'll all live with 7
# bits forever and uuencode everything".  Just a leetle push now will save
# us a *lot* of trouble.

 Thus by one stroke you knock off many sites that depend on 7 bit data
transfers. The argument that the folks who rely on 7 bit data transfer
schemes,can adapt is or stop receiving news is unrealistic and unfair.

 Surely we can add provisions for 7 bit encoding so that less fortunate
 sites can still be a part of the community.

gnu@hoptoad.uucp (John Gilmore) (01/09/89)

I wrote:
> #             It's time to say "folks who don't provide 8 bits must
> # encode 8-bit data while in transit" rather than "we'll all live with 7
> # bits forever and uuencode everything".

len@netsys1.netsys.COM (Len Rose) wrote:
>            The argument that the folks who rely on 7 bit data transfer
> schemes,can adapt is or stop receiving news is unrealistic and unfair.
>  Surely we can add provisions for 7 bit encoding so that less fortunate
>  sites can still be a part of the community.

People seem to be misreading my paragraph above, so let me try again.

=> "Folks who don't provide 8 bits must encode 8-bit data while in transit" <=

It doesn't say that "less fortunate" sites must drop off the net, it
says they must encode data!  Currently, *everyone* has to encode data
when they post it, so this can't be a substantial burden.

My impression is that there are not so many "7-bit sites" as there are
"7-bit transport media".  (I could be wrong.)  The point is that if you
send or receive netnews via a 7-bit transport medium, you have to
encode it so that all 8-bit values come through transparently.  This
saves everyone else on the net from having to encode everything to 7
bits, which wastes transmission time, disk space, and people time.
Many BITNET sites are encoding their news transmissions already, to get
around the horrible things their network does to data.  Certainly we
can provide standardized 7-bit encoding programs, like c7sendbatch and
c7unbatch.

Is that an intolerable burden on sites with 7-bit links?  It does fall
under "adapt or stop receiving news", but the adaptation is pretty painless,
and initially it would only apply to a small number of newsgroups.
-- 
John Gilmore    {sun,pacbell,uunet,pyramid,amdahl}!hoptoad!gnu    gnu@toad.com
Love your country but never trust its government.
		     -- from a hand-painted road sign in central Pennsylvania

cfe+@andrew.cmu.edu (Craig F. Everhart) (01/10/89)

Consider the Content-type: header, defined in rfc1049, rather than Format:.

Consider that 7-bit (only) data paths are given via rfc822, to which rfc1036
(the Usenet message standard) gives supremacy.

cfe+@andrew.cmu.edu (Craig F. Everhart) (01/10/89)

Consider the Content-type: header, defined in rfc1049, rather than Format:.

Consider that 7-bit (only) data paths are given via rfc822, to which rfc1036 
(the Usenet message standard) gives supremacy.

david@ms.uky.edu (David Herron -- One of the vertebrae) (01/11/89)

IBM mainframes certainly count as 7-bit sites.  I have one as a news
neighbor ... psuvm.bitnet.  For my other BITNET neighbor, a vax/vms
cluster at U of L we *do* encode (for the technical minded, it's
something like '(echo "#! bitbatch"; compress -c ${file} | btoa))'),
and things work fine.  But that IBM mainframe ... it's a whole 'nother
ball of wax alltogether.

In my personal knowledge there's 4 such machines on Usenet.  There
could easily be others, our mainframe people are considering it for
instance.  There could easily be others that I don't know about too.




Beyond that considerations ... for a loooooong time the news standard
has been to be as much like mail as possible.  In fact I use that fact
rather heavily, whenever I see something I like and need on my home computer
I simply mail it there ... MMDF has a nice program to make this simple
as well, I just do "s|resend david@davids" and off it goes.

With binary gunk in news articles I'll no longer be able to do that.
Yeah, I could write a shell script to wrap that up in a nice neat
package ...




Some of us have a dream of making a "WorldNet" .. a world-wide discussion
system derived from the current Usenet.  While it would be nice to have
binary files shipped around in binary, there is a reality that we must
pay attention to.  There are *BACKWARD* operating systems in use out
there people!  And they don't all use ASCII, and even the ones that use
ASCII don't all have a way of mixing straight text with binary gunk.

If we want to have this WorldNet thing we've gotta pay attention to
them folks.  It's mighty unfriendly to snear down our noses at them
and say they've got an ugly operating system!  There's many ugly things
about Unix as well and we got no more right to call their system
ugly as they have to say ours is.
-- 
<-- David Herron; an MMDF guy                              <david@ms.uky.edu>
<-- ska: David le casse\*'      {rutgers,uunet}!ukma!david, david@UKMA.BITNET
<-- Now I know how Zonker felt when he graduated ...
<--          Stop!  Wait!  I didn't mean to!

leonard@qiclab.UUCP (Leonard Erickson) (01/12/89)

In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
<Some of us have a dream of making a "WorldNet" .. a world-wide discussion
<system derived from the current Usenet.  While it would be nice to have
<binary files shipped around in binary, there is a reality that we must
<pay attention to.  There are *BACKWARD* operating systems in use out
<there people!  And they don't all use ASCII, and even the ones that use
<ASCII don't all have a way of mixing straight text with binary gunk.
<
<If we want to have this WorldNet thing we've gotta pay attention to
<them folks.  It's mighty unfriendly to snear down our noses at them
<and say they've got an ugly operating system!  There's many ugly things
<about Unix as well and we got no more right to call their system
<ugly as they have to say ours is.

Do note that "worldnet" will *have* to support 8-bit characters! The 
Europeans are getting a bit tired of having their languages shoehorned
into ASCII (*American* Standard Code for Information Interchange).

Once you add 8-bit support, the only problem with binary data would be
line length and indicating end-of-file. Considering how badly things
get munged from time to time, a *byte* count AND a checksum/CRC/???
would be an asset anyway. 

It sooner or later it will be *necessary* to enforce a "thou shalt not
muck up data in transit through thy site". I'd think that this could be
handled by the news-bundling/unbundling software. What you do with it
*locally* is up to you, just so what you pass on matches what you got!

8-bit mail is going to have to happen too, and for the same reasons. 
Get used to ISO-Latin-1 (or whatever they come up with for *the*
standard). 
-- 
Leonard Erickson		...!tektronix!reed!percival!bucket!leonard
CIS: [70465,203]
"I used to be a hacker. Now I'm a 'microcomputer specialist'.
You know... I'd rather be a hacker."

len@netsys.COM (Len Rose) (01/13/89)

If I understand correctly, then sites that currently batch 7 bits
won't be affected since they will just encode the binary files anyway.
As long as encoding mechanisms are left in place then I think it's a
wonderful idea.
-- 
len@netsys.com
{ames,att,rutgers}!netsys!len

rick@pavlov.bcm.tmc.edu (Richard H. Miller) (01/16/89)

The idea of making datasets available to people is a good one. However, to
actually post the dataset as news is a bad idea. Most of this data would be of
interest only to a small minority of users. A better apporach would be to
handle it like comp.archives in which articles could be posted as to the
existence of datasets along with instructions as to how to retrieve the data,
them those users who need a particular dataset could retrieve it and we would
not have to ship the data all over.



Richard H. Miller                 Email: rick@bcm.tmc.edu
Asst. Dir. for Technical Support  Voice: (713)799-4511
Baylor College of Medicine        US Mail: One Baylor Plaza, 302H
                                           Houston, Texas 77030

dww@stl.stc.co.uk (David Wright) (01/19/89)

In article <1947@qiclab.UUCP> leonard@qiclab.UUCP (Leonard Erickson) writes:
#In article <10875@s.ms.uky.edu> david@ms.uky.edu (David Herron -- One of the vertebrae) writes:
##Some of us have a dream of making a "WorldNet" .. a world-wide discussion
I thought that's what we had now?   Of course not fully world-wide yet, the
Soviet block and most of Africa and S. America is missing, but hopefully they
will come in due time (maybe soon for the first).

#Once you add 8-bit support, the only problem with binary data would be
#line length and indicating end-of-file. 

No it wouldn't.   We can ship 8-bit data now, but remember that news articles
GET DISPLAYED.   If we had a new version of news reader that could, ODA-like,
distinguish different object types and know which to display (the note saying
what this binary data was) and which to save in a file but NOT push onto the
screen, all might be well.   But we don't, and even if you have just written
one, it will be a long time before must of us are running it.   Surely most
people on the net have experienced accidentally displaying a binary file 
(e.g.  an executable binary)?   For most terminals, the result is a terminal
in some odd mode, which has to be reset in order to display the basic (or
even extended :-)) character set.     If binary data is sent as a news article,
it ought to be coded in some way that won't do this - e.g. the present btoa
or encode schemes, or some new improved one if you like.   8 bit characer sets
are no going to solve this problem either, they are still going to include
control characters and control sequences that do strange things, probably
more than now (e.g. my VT330 gets very upset if I forget to put it in 7-bit
VT100 mode before connecting over an X.25 link that sets character parity in
the 8th bit - in 8-bit mode those (7 bit + parity) characters mean something,
often something very odd).
-- 
Regards,       "Are you sure YOUR password won't appear in RTM's next list?"
        David Wright           STL, London Road, Harlow, Essex  CM17 9NA, UK
dww@stl.stc.co.uk <or> ...uunet!mcvax!ukc!stl!dww <or> PSI%234237100122::DWW