[net.news.group] rmgroup considered

rst@tardis.UUCP (Robert Thau) (11/06/85)

[Does anybody still have this bug?]

	Much of the "volume reduction" discussion seems to center on
questions of which newsgroups to eliminate.  jj wants to cut out net.flame.
Phil Ngai wants to get rid of all.mac.all.  Everybody thinks net.bizarre
should go.  And so forth.  Pervading all of this is the apparent assumption
that getting rid of individual, high-volume newsgroups is a potentially
effective way of reducing total net volume, thus reducing backbone phone
bills.  This assumption deserves to be examined.
	The most obvious way to determine what proportion of USENET load
is contributed to by a given group is to take the total traffic over two
weeks ... that is, to use the total number of bytes in /usr/spool/news/net/foo
as an index into the size of net.foo (with the obvious cautions about
expiration dates and taking the sizes of ordinary files only).  In this way,
it's fairly easy to obtain statistics like the following:

The USENET top fifteen:   [and boy, was I surprised at number one]

net.news.group:        575403  7.19
net.music:             392347  4.91
net.sources:           319899  4.00
net.sources.mac:       298630  3.73
net.flame:             291171  3.64
net.politics:          254618  3.18
net.micro.mac:         241075  3.01
net.philosophy:        232925  2.91
net.sf-lovers:         231507  2.89
net.news:              226419  2.83
net.micro.amiga:       211688  2.65
net.religion:          210681  2.63
net.movies:            207410  2.59
net.unix-wizards:      188439  2.36
net.women:             168242  2.10

As everywhere else in this article, the volumes are given as total bytes
in six days worth of postings (which happens to be the magic number around
here), and percentage of total network volume.  (For those of you who see
the problem with totals, I did this right.  See below).  Notice that deleting
all the top 15 newsgroups would appear to reduce net volume by half.  This may
seem like a lot, but half of $100,000 is still $50,000, which is still a
lot to be shelling out every month.  (It's a straw man, but what the hell).
	However, while the above is an obvious way of estimating the savings,
it is simply wrong, due to spill-over effects, cross-postings, and a bunch
of other phenomena.  For example, one does not expect that the political flames
on net.flame would all go away if net.flame were deleted --- a lot of them
would still exist on net.politics.  It's obviously difficult to get a
quantitative handle on the size of such effects, but that doesn't mean it's
not worth trying.  I hereby present my modified metric for newsgroup size:

	groupsize = sum from articles (article size) / (number of groups)

That is, we divide the size of a cross-posted article by the number of
groups in which it appears, before adding to the total for a given group.
Thus net.politics and net.flame each get "charged" for half of one of Don
Black's hate letters.  Personally, I think that this is unfair to flame
as he'd post to politics regardless, but I'll stick to the numbers, which
are:

The USENET top 15, cross-post compensated.

net.music:             370451  4.63
net.news.group:        341738  4.27
net.sources:           289265  3.62
net.sources.mac:       283740  3.55
net.philosophy:        203215  2.54
net.movies:            187437  2.34
net.politics:          185621  2.32
net.sf-lovers:         183882  2.30
net.micro.mac:         181101  2.26
net.micro.amiga:       180024  2.25
net.women:             163623  2.05
net.flame:             163516  2.04
net.unix-wizards:      145966  1.82
net.religion:          141192  1.77
net.jokes:             138531  1.73

To give a point of comparison, in the last six days, Rich Rosen posted
146193 bytes, amounting to a little under two percent of USENET volume.
He generated over 55K of text in net.philosophy alone (cross-post compenstaed)
which is over one quarter of the traffic in that newsgroup.  If Rich were a
newsgroup himself, he'd rank thirteenth.  I am not making this up --- I
wouldn't dare fabricate anything this ridiculous.  The power of the point-
by-point reply is awesome.

These groups still amount to roughly forty percent of the total net content,
but notice that most of the popular candidates for removal have moved down.
Net.flame, in particular, has dropped dramatically.  Notice also
that most of these groups are "content" groups, one third are technical
or net.administrative, and that net.music,  which no one gripes about,
has an incredibly high volume, much of which turns out to be
extensive lists of peoples' favorite guitarists.  (Raw data not posted;
check for yourself).

One further statistic of interest is that well over 60% of net traffic
takes place in newsgroups which individually take up less than 2% of the
volume (cross-post compensated).  Remember, that's less than Rich Rosen
generates alone.

What's the bottom line?  Elimination of all the heaviest newsgroups, under
the most favorable assumptions (all traffic associated with any deleted
newsgroup is gone and forgotten), is still only worth a factor of two on
the backbone sites' phone bills.  No one is going to delete *all* of
those newsgroups (including this one).  Below this, one quickly gets to
newsgroups which individually form a relatively small amount of the traffic,
which means an extremely large number of these groups would have to be deleted
to do any real good.  The same caveats apply to moderation.

In short, rmgroup does not seem to be a fantastically
productive way of reducing the total volume.  Protests from those whose groups
were deleted (which *must* be behind the ascendancy of net.news.group) and
the general antisocial nature of the process don't make it any more attractive.
Lastly, if the stats on Rosen mean anything at all, they mean that reining
in overeager posters would be at least as big a win.  The recent idea of
removing the "followup" command completely seems a good start.
--
rst@tardis