rhg@cpsolv.UUCP (Richard H. Gumpertz) (10/27/89)
This is so obvious that it must have been asked before, but nobody I asked seems to have answer. Please bear with me and answer it one more time. My totally unscientific eye tells me that typical news articles are about 1-2000 bytes long (with many smaller an many larger). Anyway, this would seem to indicate that compression might reduce the typical article from 2 disk blocks (at 1K) to 1. Bigger articles might do even better; small articles would stay at one block. Why aren't news articles compressed (and decompressed when read or forwarded)? Does C news maybe compress them? It seems like an effective halving of disk usage would easily pay for the cycles needed to compress/uncompress. -- ========================================================================== | Richard H. Gumpertz rhg@cpsolv.uu.NET -or- ...!uunet!amgraf!cpsolv!rhg | | Computer Problem Solving, 8905 Mohawk Lane, Leawood, Kansas 66206-1749 | ==========================================================================
henry@utzoo.uucp (Henry Spencer) (10/27/89)
In article <431@cpsolv.UUCP> rhg@cpsolv.uucp (Richard H. Gumpertz) writes: >Why aren't news articles compressed (and decompressed when read or forwarded)? >Does C news maybe compress them? It seems like an effective halving of disk >usage would easily pay for the cycles needed to compress/uncompress. No, we don't compress them. In general, we didn't change the way news is stored; we thought about a whole bunch of possible schemes and concluded that none of them had enough advantages to be worthwhile. Compressing lots and lots of small files is very expensive and the degree of compression is not that impressive. Admittedly, the quantized allocation of disk space tends to magnify the effect for such small files, but it's still a lot of work for limited gain. It means that an article has to be decompressed every time it is read, batched for transmission to another site, or processed in any other way. The performance impact, on a busy machine, would be horrendous. Our perception was that shortening expiry times is generally a more cost-effective way of economizing on disk. There is also a pragmatic issue in that it means modifying *all* the news readers. There are lots of those, many more than you'd think. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
rhg@cpsolv.UUCP (Richard H. Gumpertz) (10/29/89)
In article <1989Oct27.161920.5169@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes: > Our perception was that shortening expiry >times is generally a more cost-effective way of economizing on disk. Why not make it an option that each site could choose to enable or disable depending on the relative cost of disk sectors and CPU cycles at that site? >There is also a pragmatic issue in that it means modifying *all* the news >readers. There are lots of those, many more than you'd think. Readers would just be modified to look for either nnn or nnn.Z. Not all that major. A given site would switch on compression only after the local readers had all been fixed. For small sites, where compression is most likely to be valuable, there are probably few readers to fix. -- ========================================================================== | Richard H. Gumpertz rhg@cpsolv.uu.NET -or- ...!uunet!amgraf!cpsolv!rhg | | Computer Problem Solving, 8905 Mohawk Lane, Leawood, Kansas 66206-1749 | ==========================================================================
brad@looking.on.ca (Brad Templeton) (10/30/89)
While compression could be good, it might not be as good as you think. First of all, the average usenet article is 2K. On a site with 2K blocks, compression might do nothing for many of the files. Of course, on a more typical 1K block site, it could do very well. But the biggest gain would come from big files. For source postings, that would be great. For binaries (about 12% of the volume of the net) it might not do as well, as many are already compressed, although expanded out a bit with uuencoding. More would be gained by keeping the articles in indexable archives of some sort, possibly compressed as well. Depending on block size, 20-30% of the disk space in your spool is wasted due to block granularity. -- Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473
rick@uunet.UU.NET (Rick Adams) (10/30/89)
I just ran this article size breakdown today (includes headers): Kbytes Count % Kbytes Count % 1 14915 29.4% 11 62 0.1% 2 22971 45.3% 12 57 0.1% 3 6779 13.4% 13 41 0.1% 4 2636 5.2% 14 29 0.1% 5 1216 2.4% 15 47 0.1% 6 620 1.2% 16 21 0.0% 7 355 0.7% 17 23 0.0% 8 220 0.4% 18 25 0.0% 9 138 0.3% 19 22 0.0% 10 100 0.2% >= 20 438 0.8%
henry@utzoo.uucp (Henry Spencer) (10/30/89)
In article <432@cpsolv.UUCP> rhg@cpsolv.uucp (Richard H. Gumpertz) writes: >> Our perception was that shortening expiry >>times is generally a more cost-effective way of economizing on disk. > >Why not make it an option that each site could choose to enable or disable >depending on the relative cost of disk sectors and CPU cycles at that site? Basically because we didn't have time to do everything, and we perceived this one as having insufficient payoff to make up for the impact on performance, compatibility, and complexity. Almost any feature has some chance of being useful to *someone*, but when one is not trying to solve all the world's problems (which C News is not -- we'll settle for 90%), including "just one more feature" is always a judgement call. -- A bit of tolerance is worth a | Henry Spencer at U of Toronto Zoology megabyte of flaming. | uunet!attcan!utzoo!henry henry@zoo.toronto.edu
amanda@intercon.com (Amanda Walker) (11/10/89)
I got it backwards... Sigh. It happens to all of us every now and then, I guess :-). All I can plead as an excuse is that I haven't sat in front of a UNIX box to do anything more programming-related than fix sendmail.cf for about a year now... I've gotten used to putting smarts into routine libraries instead of executables that you can pipe together (just try piping stuff around the Macintosh Programmer's Workshop :-P). Mea culpa :-). -- Amanda Walker <amanda@intercon.com>