[comp.sources.bugs] sending source code

randy@umn-cs.UUCP (Randy Orrison) (11/02/87)

In article <332@uvicctr.UUCP> sbanner1@uvicctr.UUCP (S. John Banner) writes:
>                              I think that if we have to set some sort
>of a standard for this (and I think that we will have do deal with this
>problem sooner or later), the combination of zoo, and uuencode is
>probably the way to go.

How about a modification to zoo:  Have it produce btoa'd output, and recognize
that transparently as input?  I haven't gotten it running on this Encore yet,
but it doesn't seem like a text header giving the format (8 bit or btoa'd)
would be unreasonable.  Anyone want to work on this?  (I would, but like I said
i haven't even had time to find the bug that dumps core here...)

(btoa: like uuencode, but adds only 20% to size)

>                      S. John Banner
>
>...!uw-beaver!uvicctr!sol!sbanner1
>...!ubc-vision!uvicctr!sol!sbanner1
>ccsjb@uvvm
>sbanner1@sol.UVIC.CDN

	-randy
-- 
Randy Orrison, University of Minnesota School of Mathematics
UUCP:	{ihnp4, seismo!rutgers!umnd-cs, sun}!umn-cs!randy
ARPA:	randy@ux.acss.umn.edu		 (Yes, these are three
BITNET:	randy@umnacca			 different machines)

dan@srs.UUCP (Dan Kegel) (11/05/87)

Roughly paraphrased:
> It is desireable to use a compressing archiver (like zoo or arc)
> to package groups of files for transmission.
> However, the resulting archives must be uuencoded (or btoa'd) before
> transmission on Usenet.
> Why not just make the compressing archiver output in a format
> suitable for direct transmission on Usenet?

Sounds like a good idea to me.  I've always disliked having to go
thru three steps (paste, uudecode, unarchive) to decode these postings;
it's work I'd just as soon have a program do for me.

I think that more thought needs to be given to transmitting large
binary files over Usenet.  People often distribute documentation and
source in this format, so the old objection "But executables don't belong 
on Usenet" no longer applies.

- Dan Kegel

kent@xanth.UUCP (Kent Paul Dolan) (11/07/87)

In article <467@srs.UUCP> dan@srs.UUCP (Dan Kegel) writes:
>Roughly paraphrased:
>> It is desirable to use a compressing archiver (like zoo or arc)
>> to package groups of files for transmission.
>> However, the resulting archives must be uuencoded (or btoa'd) before
>> transmission on Usenet.
>> Why not just make the compressing archiver output in a format
>> suitable for direct transmission on Usenet?
>
>Sounds like a good idea to me.  I've always disliked having to go
>thru three steps (paste, uudecode, unarchive) to decode these postings;
>it's work I'd just as soon have a program do for me.
>
>I think that more thought needs to be given to transmitting large
>binary files over Usenet.  People often distribute documentation and
>source in this format, so the old objection "But executables don't belong 
>on Usenet" no longer applies.
>
>- Dan Kegel


A couple of months ago, I took the algorithm from the June '87 CACM
article "Arithmetic Coding for Data Compression", and recoded it under
contract into FORTRAN 77.

Due to stupidities on both sides (mostly mine, I'm afraid), I didn't
get paid, the software didn't get used, and all copies were destroyed!
However, I ended up owner of the neat algorithm additions I invented.

I had made it work, efficiently, and with one nice wrinkle.  The
original CACM algorithm encoded output bits into 8 bit bytes for
compressed data storage.  I needed to send the resulting file across a
smart communications line; i.e., the transmitted data had to be
printable.  The kermit escape encoding looked too expensive, so I made
a little switch.  It turns out that 95*95 (printable ASCII characters)
is just a bit more than 2^13, so by encoding 6.5 bits of data per byte
by doing the obvious mod, multiply and shift for each byte, I was able
to compress into printable ASCII.  The result was about a wash for
executables; they were about as big transmitted (compressed into
printable ASCII) as unencoded and uncompressed, which is probably at
least as good as the present situation.  Various kinds of text behaved
quite a bit better.

So, would it be worthwhile for me to rewrite this stuff in C, or would
someone else like to go ahead and do this, given these hints?  I, too,
think that a one step process to do this would be better.  I can do
the compression and decompression routines, and put them on the net,
if someone who does Unix systems stuff could go on from there and make
them into a uuencode-/uudecode-like utility, which is probably beyond
my skills.  I assume that the CACM algorithm is publicly usable.

Comments?

Kent, the (totally weird) man from xanth.

Running for president on a pound of caffeine, an ounce of sense, and a
program of increased exploration and exploitation of space.  Support
your (probably non-existent - get busy!) local branch of the Birthright
Party:  "The birthright of mankind is the stars!"

Hey, it's better than dwelling on your stock portfolio; at least here
you've got a chance for a laugh or two. ;-)  Yum!  Eat them plastic
chickens, brethren!  Call me when I'm elected; 'til then, I'm going to
take a nap.

paddock@48color.UUCP (Steve Paddock) (11/07/87)

In article <467@srs.UUCP> dan@srs.UUCP (Dan Kegel) writes:

>I think that more thought needs to be given to transmitting large
>binary files over Usenet.  People often distribute documentation and
>source in this format, so the old objection "But executables don't belong 
>on Usenet" no longer applies.

I've given it some thought.  The thought is that every time I see one
I hit the n(ext) key.  

Take a look at the news documentation.  Most unix sites send compressed 
news batches.  This means that the savings I infer you want are 
already transparently being achieved.  In fact, a uuencoded binary 
compresses very little compared to source.  

Finally,  I challenge, but will not debate, your logic regarding the
posting of binaries.  See the archives for a full discussion.
-- 
Steve Paddock (ut-ngp!auscso!mybest!paddock) 512-477-9736
Best Printing Co,  Austin, Texas  78767

dhesi@bsu-cs.UUCP (Rahul Dhesi) (11/07/87)

In article <467@srs.UUCP> dan@srs.UUCP (Dan Kegel) paraphrases somebody:
     It is desirable to use a compressing archiver (like zoo or arc)
     to package groups of files for transmission.  However, the
     resulting archives must be uuencoded (or btoa'd) before
     transmission on Usenet.  Why not just make the compressing
     archiver output in a format suitable for direct transmission on
     Usenet?

In August 1986, when I wrote the first version of zoo, one of the
options I considered was to allow printable output similar to
uuencoding.  But at that time I did not have access to Usenet, and the
utility of doing so did not seem too great.

Things changed later, but the design of the archive format was a bit
stabilized.  However, I decided to make printable output a viable
option and figured out how to do it so the archiver would automatically
recognize it and treat it like any other archive.  Thus, if this is
ever implemented (say around 1990??), you will be able to take a
printable zoo archive that is part of an article and feed it to zoo and
it will recognize the beginning of the archive in the article and
extract it.

To this end, the current version (zoo 1.71) does all file I/O through
an extra layer of I/O functions, allowing any special I/O translations
(potentially ASCII/EBCDIC, CR/LF translation, or uu{en,de}coding) to be
done in the low-level routines without changing the rest of the program
at all.  (Also called "object-oriented I/O".)  Whether I will ever have
time to implement this is another question--the list of things to do is
long enough already.
-- 
Rahul Dhesi         UUCP:  <backbones>!{iuvax,pur-ee,uunet}!bsu-cs!dhesi

woods@hao.UCAR.EDU (Greg Woods) (11/11/87)

In article <49@48color.UUCP> paddock@48color.UUCP (pri=-0-Steve Paddock) writes:
>In article <467@srs.UUCP> dan@srs.UUCP (Dan Kegel) writes:
>
>>I think that more thought needs to be given to transmitting large
>>binary files over Usenet. 

  I think more thought needs to be given to the cost effectiveness of doing
this. Last night, our spool directory ran out of space because someone
was mailing over a megabyte of chess game sources and records of chess
games through us. Yes, we run that close to the edge. So do most other
major sites, I expect. We can't afford to let multi-megabytes of disk
space sit idle most of the time just to handle the occasional large mailing.
I suggest that it is more reliable and more cost efficient to send mag
tapes via traditional shipping methods. That way, those who benefit
from the transmission pay the cost of it instead of us. 
  In my view, there are two legitimate uses of the generosity of sites
like ours that transmit things freely for everyone: things that a large
number of people can benefit from (news articles), or SHORT private 
messages that won't cause our phone bills to go through the roof or
our spool directories to overflow. Please THINK about what you are
doing (or thinking of doing). It is a BAD IDEA to encourage the mailing
of large sources. I wish uuencode had never been written.

PLEASE DON'T MAIL SOURCES OVER THE NET!!!

Thanks,
--Greg
--
UUCP: {husc6, gatech, oddjob, ames, noao, rutgers}!hao!woods
CSNET: woods@ncar.csnet  INTERNET: woods@hao.ucar.edu

dricej@drilex.UUCP (Craig Jackson) (11/13/87)

Several articles in this chain have addressed the problems of sending
the output of ARC, etc through Usenet.  Most of them have indicated that the
apparent problem is that the output of these programs is binary, and therefore
the overhead of uuencode or btoa must be incurred.  Some have suggested that
these programs output printable ASCII directly.

Unfortunately, its not the unprintability that matters for these forms, its the
*compression*.  Most Usenet links (except for dedicated links such as the
Arpanet) already use state-of-the-art compression technology to transmit
news.  There's only so much compression that can be achieved on any piece
of information, especially if everybody uses the same algorithm.  In fact,
double compression generally results in expansion.

So the moral is: if you have to send binary, ARC it all you want, but TURN OFF
THE COMPRESSION.

However, I will second Greg's plea: If you have large sources to send, the
regular mails probably have a higher baud-rate.  (In addition, you don't 
impose on other system's hospitality.)
-- 
Craig Jackson
UUCP: {harvard!axiom,linus!axiom,ll-xn}!drilex!dricej
BIX:  cjackson