rst@tardis.UUCP (Robert Thau) (11/06/85)
[Does anybody still have this bug?] Much of the "volume reduction" discussion seems to center on questions of which newsgroups to eliminate. jj wants to cut out net.flame. Phil Ngai wants to get rid of all.mac.all. Everybody thinks net.bizarre should go. And so forth. Pervading all of this is the apparent assumption that getting rid of individual, high-volume newsgroups is a potentially effective way of reducing total net volume, thus reducing backbone phone bills. This assumption deserves to be examined. The most obvious way to determine what proportion of USENET load is contributed to by a given group is to take the total traffic over two weeks ... that is, to use the total number of bytes in /usr/spool/news/net/foo as an index into the size of net.foo (with the obvious cautions about expiration dates and taking the sizes of ordinary files only). In this way, it's fairly easy to obtain statistics like the following: The USENET top fifteen: [and boy, was I surprised at number one] net.news.group: 575403 7.19 net.music: 392347 4.91 net.sources: 319899 4.00 net.sources.mac: 298630 3.73 net.flame: 291171 3.64 net.politics: 254618 3.18 net.micro.mac: 241075 3.01 net.philosophy: 232925 2.91 net.sf-lovers: 231507 2.89 net.news: 226419 2.83 net.micro.amiga: 211688 2.65 net.religion: 210681 2.63 net.movies: 207410 2.59 net.unix-wizards: 188439 2.36 net.women: 168242 2.10 As everywhere else in this article, the volumes are given as total bytes in six days worth of postings (which happens to be the magic number around here), and percentage of total network volume. (For those of you who see the problem with totals, I did this right. See below). Notice that deleting all the top 15 newsgroups would appear to reduce net volume by half. This may seem like a lot, but half of $100,000 is still $50,000, which is still a lot to be shelling out every month. (It's a straw man, but what the hell). However, while the above is an obvious way of estimating the savings, it is simply wrong, due to spill-over effects, cross-postings, and a bunch of other phenomena. For example, one does not expect that the political flames on net.flame would all go away if net.flame were deleted --- a lot of them would still exist on net.politics. It's obviously difficult to get a quantitative handle on the size of such effects, but that doesn't mean it's not worth trying. I hereby present my modified metric for newsgroup size: groupsize = sum from articles (article size) / (number of groups) That is, we divide the size of a cross-posted article by the number of groups in which it appears, before adding to the total for a given group. Thus net.politics and net.flame each get "charged" for half of one of Don Black's hate letters. Personally, I think that this is unfair to flame as he'd post to politics regardless, but I'll stick to the numbers, which are: The USENET top 15, cross-post compensated. net.music: 370451 4.63 net.news.group: 341738 4.27 net.sources: 289265 3.62 net.sources.mac: 283740 3.55 net.philosophy: 203215 2.54 net.movies: 187437 2.34 net.politics: 185621 2.32 net.sf-lovers: 183882 2.30 net.micro.mac: 181101 2.26 net.micro.amiga: 180024 2.25 net.women: 163623 2.05 net.flame: 163516 2.04 net.unix-wizards: 145966 1.82 net.religion: 141192 1.77 net.jokes: 138531 1.73 To give a point of comparison, in the last six days, Rich Rosen posted 146193 bytes, amounting to a little under two percent of USENET volume. He generated over 55K of text in net.philosophy alone (cross-post compenstaed) which is over one quarter of the traffic in that newsgroup. If Rich were a newsgroup himself, he'd rank thirteenth. I am not making this up --- I wouldn't dare fabricate anything this ridiculous. The power of the point- by-point reply is awesome. These groups still amount to roughly forty percent of the total net content, but notice that most of the popular candidates for removal have moved down. Net.flame, in particular, has dropped dramatically. Notice also that most of these groups are "content" groups, one third are technical or net.administrative, and that net.music, which no one gripes about, has an incredibly high volume, much of which turns out to be extensive lists of peoples' favorite guitarists. (Raw data not posted; check for yourself). One further statistic of interest is that well over 60% of net traffic takes place in newsgroups which individually take up less than 2% of the volume (cross-post compensated). Remember, that's less than Rich Rosen generates alone. What's the bottom line? Elimination of all the heaviest newsgroups, under the most favorable assumptions (all traffic associated with any deleted newsgroup is gone and forgotten), is still only worth a factor of two on the backbone sites' phone bills. No one is going to delete *all* of those newsgroups (including this one). Below this, one quickly gets to newsgroups which individually form a relatively small amount of the traffic, which means an extremely large number of these groups would have to be deleted to do any real good. The same caveats apply to moderation. In short, rmgroup does not seem to be a fantastically productive way of reducing the total volume. Protests from those whose groups were deleted (which *must* be behind the ascendancy of net.news.group) and the general antisocial nature of the process don't make it any more attractive. Lastly, if the stats on Rosen mean anything at all, they mean that reining in overeager posters would be at least as big a win. The recent idea of removing the "followup" command completely seems a good start. -- rst@tardis