[net.sources.d] Posting compressed info

heiby@cuae2.UUCP (Ron Heiby) (08/26/86)

This article is in response to an article in net.sources which is just
one example of a misguided trend.  That is, the use of compression and
uuencoding by posters of "large" text information.  I'm sure that the
posters who do this have the best of intentions, and I don't blame them
for not understanding how the net works.

A recent posting of the adventure source was compressed on the poster's
machine, then uuencoded into ASCII.  This actually reduced the number
of characters being posted by a fair amount.  I presume that the same
is the case with the above referenced article.  There are two (at least)
factors working against doing this.  The first is that many sites don't
have the uudecode program or the compress program, so can't read the
posting at all.  This means that they are paying to transmit "less" (see
below) bytes, but none of them are usable by that site.

The other factor against the scheme is that it doesn't actually save any
money.  I took the bodies of the adventure source articles and put them
together in a directory.  I recorded the total size of the files, then
ran them through uudecode (which I had a heck of a time getting, see above),
and then through uncompress.  I noted that the files were now a fair amount
larger.  (Looks pretty good, huh?)  The rub is that many/most news feeds
that are concerned with phone costs or dialer time are already using
compress on the batches of news being sent, so what really needs to be
compared is the size of the original text (compressed) and the compressed
uuencoded files (compressed).  When I checked the adventure source, the
original files, when compressed individually, totalled just under 300
blocks of disk space.  The files as actually posted (original | compress |
uuencode), when compressed individually, totalled about 375 blocks, an
*increase* of 25% over the very links that are concerned enough about
costs to use compress on their news links.

The "bottom line" is, "Post CLEAR text.  Let the transmission mechanism
worry about compression."  Thanks.
-- 
Ron Heiby {NAC|ihnp4}!cuae2!heiby   Moderator: mod.newprod & mod.os.unix
AT&T-IS, /app/eng, Lisle, IL	(312) 810-6109
"'Cause there's lots of things in this world that need to BE turned around."

campbell@maynard.UUCP (Larry Campbell) (08/28/86)

In article <2315@cuae2.UUCP> heiby@cuae2.UUCP (-Ron Heiby) writes:
>This article is in response to an article in net.sources which is just
>one example of a misguided trend.  That is, the use of compression and
>uuencoding by posters of "large" text information.  ...
>                                                   ... many sites don't
>have the uudecode program or the compress program, so can't read the
>posting at all.  This means that they are paying to transmit "less" (see
>below) bytes, but none of them are usable by that site.

And some sites (like mine) are 16-bit machines that can't uncompress
files compressed with "-b 13" or greater (the S2575 text was posted with
"-b 14").  I would be most grateful if someone could mail me the clear
text of S2575, or the text compressed with "-b 12" (AFTER first offering
and getting a response from me, of course -- I don't want to get fifteen
copies!).
-- 
Larry Campbell                             The Boston Software Works, Inc.
ARPA: campbell%maynard.uucp@harvard.ARPA   120 Fulton Street, Boston MA 02109
UUCP: {alliant,wjh12}!maynard!campbell     (617) 367-6846

jose@utcs.uucp (08/28/86)

In article <2315@cuae2.UUCP> heiby@cuae2.UUCP (-Ron Heiby) writes:
>This article is in response to an article in net.sources which is just
>one example of a misguided trend.  That is, the use of compression and
>uuencoding by posters of "large" text information.  I'm sure that the
>posters who do this have the best of intentions, and I don't blame them
>for not understanding how the net works.
>
I second the motion!  Just imagine that if 5 people at 1 site decide that they
want to look at these files, never mind trying to use them.  Each and every
user has to:

1) save the posted file somewhere (probably in a NEW directory)
2) uudecode/de-compress/whatever to get the REAL files
3) look at them and decide
 
While each person is doing this his process is using cpu cycles and LOTS of
disk space.  And all for nothing if the files are not apropriate!  Please
folks: let's not waste our time trying to "improve" the system by out-thinking
what some system administrator has spent hours working on!  Send files as they
are and let the software worry about it!
-- 

Jose A. Dias			       University of Toronto Computing Services
-------------------------------------------------------------------------------
The  above  ascii characters are  not, have not ever been, or will ever be, the
opinion  of  anybody,  being,  or  super-intelligent shade of the  colour blue.
They  were just a fluke.  They were put together by randomnly selecting phrases
from Vogon poetry...
-------------------------------------------------------------------------------
uucp:          {decvax,ihnp4,utcsri,{allegra,linus}!utzoo}!utcs!jose
bitnet:        JOSE@UTORONTO
300/1200:      (416)535-5360			      (As the crow flies... :-)

bogstad@brl-smoke.ARPA (William Bogstad ) (08/28/86)

[I apologize in advance to the people on the ARPANET who will get this
in UNIX-SOURCES.  Maybe there should be a UNIX-SOURCES-D mailing list?]

	I decided to take a look at the effect of compress as suggested.
I used as a sample the full text of the Aug 12th Senate bill 2575.
(Note this was never posted to the net, but is a result of the two
seperate postings made by myself and Glenn Tenney.)  I used the
following notation for the file names:

	Roots				Suffix
	=====				======

 	whole - the original text	.Z - compressed (b16 -default on vax?)
 	part? - one of a # of parts	.uu - uuencoded
	total - sum of all parts	(read from left to right with each
	(not always the same as whole)	conversion being done in turn.) 

[A list of files  and their sizes is at the end of this posting.]

	The first thing to note is that because the text as a whole is
>64K it should not be posted uncompressed as a single message.  Too many
sites truncate messages at that limit to make this a viable option.

	The figures also indicate that if you are going to post a
uuencoded compressed message it is better to compress the whole rather
then breaking it into parts.  The figures to compare are for whole.Z.uu
and total.Z.uu.

	You now have to compare a single compressed posting with two
uncompressed postings.  For sites that do not use compression on their
newsfeeds the gain is large (total - whole.Z ~= 40K).  Sites that
do compress their news feeds have a small loss (total.Z - whole.Z.uu.Z
~= 5.5K).  Using the figures from mod.newslists of 800 baud throughput
and $.15 a minute cost this loss per such feed is ~= 18 cents.

Thus far we have looked at the effects of compression on the cost of
transmission.  In addition, we have assummed that any sites that "pay"
for their calls will use compression on their newsfeeds and that line
charges are the only cost.  Let's add some "reasonable?" figures for
disk storage.  I will use the following figures:  $10,000 for 400M
drive, 3 year life span, 2 week expiration times.  This translates to $1
for 40K for 3 years (no I didn't fudge the figures).  2 weeks is 1/78 of
that period which gives a cost differential (for EVERY site) of 1.2
cents.  Many sites use longer expiration times on sources so the average
could easily be higher.

	I'd like to come to a conclusion here, but I'm afraid I still
can't do so.  I don't know what the ratio of long distance (LD) feeds to
sites is and I don't know the ratio of compressed LD to non-compressed
LD sites is.  In addition, there is the cost for the CPU time used for
the per site compression/uncompression and actual transmission of the
feed.  Many sites, however, can "hide" these costs and only have to
justify their LD charges.

	Perhaps the whole thing should hinge on the ease of use.  Some
people apparently do not have access to compress and uuencode or have
machines that can not handle the larger -b values which is the default
with compress on many machines.  For myself, I will probably post
straight text in the future in order to avoid having to mail copies to
people who can't use the original.  I do think, however, that before a
net-wide rule (suggestion?) is made that these other factors be
considered.  If you do use compress be sure to use the -b 12 option so
anyone with compress can read it.  It really doesn't save much
additional space to use the larger bit values.

70490	whole
28524	whole.Z
39320	whole.Z.uu
35934	whole.Z.uu.Z

38669	part1		70490	total
17013	part1.Z		30647	total.Z
23466	part1.Z.uu	42276	total.Z.uu
21748	part1.Z.uu.Z	39449	total.Z.uu.Z
31821	part2
13634	part2.Z
18810	part2.Z.uu
17701	part2.Z.uu.Z

				Bill Bogstad
				bogstad@hopkins-eecs-bravo.arpa
				bogstad@brl-smoke.arpa

tenney@well.UUCP (Glenn S. Tenney) (08/29/86)

As the offending poster of S.2575 I want to say I'm sorry.
I had never posted anything that large before and was concerned to send
out something that large.  I had asked the question of a knowledgeable
net user, but the response came back the day after my posting.  Boy, I'll
never do that again.  Now I'm faced with the 4 or 5 people that need the
clear text of S2575.  I'll wait a couple of days to see if there are any
responses, but should I: (1) repost it clear; or (2) mail to those that
need it clear?

With egg on my posting...
-- Glenn

woods@hao.UUCP (Greg Woods) (08/30/86)

  Sites that are really concerned about cost will compress their news.
Those that do not compress obviously have local feeds, money to burn,
or more money for phone bills than spare CPU cycles. At any rate, from
all this discussion, it seems as though uuencode/compress penalizes
the very sites that have expressed enough concern about phone
costs to do something about it. Please don't do it.

--Greg

bobmon@iuvax.UUCP (Robert Montante) (08/31/86)

As one of those who has contributed in a small way to this "misguided trend", I
want to offer a couple of excuses for my belie that user-compression was 
useful.  In my case I was thinking more of binary files than of text files, so
that readability wasn't such an issue ('So what are they doing in NET.SOURCES?'
you ask.  Well, uh... oops?)  Since I wasn't aware that many mailers DO
compress things, I never took that conflict into consideration.

Again in my case, a much more significant point is that I want to retransmit
many of the more interesting (read: LARGE) files to my home computer.  I do
this via kermit on a 7-bit phone line, and kermit in binary mode is a pig-dog.
So I find it desirable to compress the file somehow for the same reason the
mailers do, and then I want to uuencode it for kermit's sake.  If the
originator of the posting uploaded it from a personal computer, then it may
have gone through compression/uuencoding in the first place.  AND, since the
popular compression programs for personal comp's aren't all compatible with
compression programs on the mainframes, it would be a real problem to compress,
upload, decompress, and post, followed by copy, compress, download, decompress
at home.

The preceding paragraph is at best an argument for compression of files that
are specific to personal computers.  In general, I have to concede the argument
that the inter-USENET mailing programs should manage compression/transmission
(I said these were excuses, not defenses :-).

As a final follow-up to all that, I would like someone in the know to explain
'shar' postings to me.  As far as I can see they merely repackage text files
into a slightly different text format, and they seem to cost a percent or so
in size.  What advantage am I unaware of that makes them worth the effort?

Thanx to all for the education...
	...Bob...

*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*-=-*
Opinion + Enforcement => Fact ;  'believe it or else!'

RAMontante
Computer Science				"Have you hugged ME today?"
Indiana University

campbell@maynard.UUCP (Larry Campbell) (09/01/86)

In article <1280@iuvax.UUCP> bobmon@iuvax.UUCP (Robert Montante) writes:
>As a final follow-up to all that, I would like someone in the know to explain
>'shar' postings to me.  As far as I can see they merely repackage text files
>into a slightly different text format, and they seem to cost a percent or so
>in size.  What advantage am I unaware of that makes them worth the effort?

"shar" postings are shell scripts that can be fed directly to the
shell.  Thus, with one simple command, you can get your news-reading
program to automatically split the posting up into files and
directories.  If it's just a single file, it's true that there's not
much point (except uniformity).  But when the posting consists of
three or four directories containing fifteen or twenty files, it sure
is nice to be able to say "s | sh" and have the files pop into
existence.  The alternative, picking the files apart by hand with an
editor, would be a royal pain.

In addition, some flavors of shar postings contain consistency checks
that help detect whether the files have been munged somehow in transit.
Shar postings also can set permission bits so that, for example, shell
scripts are made executable, saving you the trouble of doing it by hand.

And finally, most versions of shar prepend an 'X<tab>' to the start of
each line.  This helps prevent mungage by certain brain-damaged mail
software that truncates messages at any lines containing only a single
period.  Remember that Usenet postings must often traverse some pretty
convoluted and unreliable paths before they reach certain readers.

These are the reasons that shar postings are, and ought to be, the
de facto standard for posting files in net.sources.
-- 
Larry Campbell                             The Boston Software Works, Inc.
ARPA: campbell%maynard.uucp@harvard.ARPA   120 Fulton Street, Boston MA 02109
UUCP: {alliant,wjh12}!maynard!campbell     (617) 367-6846

WDMCU@CUNYVM.BITNET (09/11/86)

I would appreciate a plain-text posting of S.2575 or a copy sent to me as I am
on a VM system and the very UNIX things you are talking about are unavailable.
     
Thanks.
/*--------------------------------------------------------------------*/
/* Bill Michtom - work: (212) 903-3685 home: (718) 788-5946           */
/*                                                                    */
/*      WDMCU@CUNYVM (Bitnet)        Timelessness is transient        */
/*      BILL@BITNIC  (Bitnet)                                         */
/*                                                                    */
/*        Never blame on malice that which can be adequately          */
/*                 explained by stupidity.                            */
/*    A conclusion is the place where you got tired of thinking.      */
/*--------------------------------------------------------------------*/