[net.news] Top posters' favorite newsgroups

david@ukma.UUCP (David Herron, NPR Lover) (12/23/85)

This time when seeing the top 25 posters lists, etc, I had this inspiration
(If it isn't a good one, blame it on the wine I had earlier :-)).  Anyway...

What newsgroups attract large postings?

I was thinking about the debate a while ago about Rich Rosen (oh, he's
sneaking back up, at #7 this time) and how he'd be a huge newsgroup
all by himself.  Then I started noticing some of the other posters
on the list and that they tended to be from similar newsgroups.
(net.politics, net.religion, etc).  So a little work produced this 
list:


6557 net.sources
2125 net.philosophy
1880 net.sources.mac
1787 net.religion
1699 net.politics
1632 net.sources.games
1347 net.politics.theory
675 net.jokes
647 net.religion.christian
450 net.micro.amiga
232 net.comics
230 net.games.pbm
228 net.abortion
132 net.sf-lovers
110 net.music
85 net.games.rogue
82 net.micro.pc
72 net.games.frp
60 net.religion.jewish
60 net.nlang.africa
50 net.invest
50 net.consumers
44 net.misc
44 net.astro
42 net.movies
42 net.games.board
40 net.tv
36 net.unix-wizards
35 net.arch
30 net.micro
25 net.kids
24 net.jokes.d
24 net.auto
20 net.origins


The numbers are generated directly from the latest Top 25 Posters list.
They are, for each newsgroup, give points to the newsgroup based on
the posters' percentage of the network and that newsgroups percentage
of that posters posting.  Specifically, the newsgroup percentage is
taken as is, and the posters percentage is multiplied by 10.

I've appended the awk script used to the end of this article...

Now the results aren't terribly unexpected.  The top 10 are dominated
by the sources groups and the talk-talk groups.  But net.jokes is a
bit of a surprise.  In looking at the original data I see that
one poster (about .6% of the net) posted soley to net.jokes.  I don't
know enough about statistics to know if this is really an anomoly, and
if it is how to correct it.  The first pass at the problem was simply
the number of percentages per newsgroup, but after that it seemed obvious
that it needed to be weighted some.


What do people think????


-----------------------> Cut here <-----------------------
# USAGE: awk -f a.awk <datafile | sort -n -r >reportfile
#
# Input format is
#
# <number-of-points> <newsgroup-name>
# or
# <Portion-of-net>
#

NF == 1 {
	size = $1
	}

NF == 2	{
	count[$2] += ($1 * size)
	}

END	{
	for (i in count)
		print count[i], i
	}

-- 
David Herron,  cbosgd!ukma!david, david@UKMA.BITNET.

Experience is something you don't get until just after you need it.