david@ukma.UUCP (David Herron, NPR Lover) (12/23/85)
This time when seeing the top 25 posters lists, etc, I had this inspiration (If it isn't a good one, blame it on the wine I had earlier :-)). Anyway... What newsgroups attract large postings? I was thinking about the debate a while ago about Rich Rosen (oh, he's sneaking back up, at #7 this time) and how he'd be a huge newsgroup all by himself. Then I started noticing some of the other posters on the list and that they tended to be from similar newsgroups. (net.politics, net.religion, etc). So a little work produced this list: 6557 net.sources 2125 net.philosophy 1880 net.sources.mac 1787 net.religion 1699 net.politics 1632 net.sources.games 1347 net.politics.theory 675 net.jokes 647 net.religion.christian 450 net.micro.amiga 232 net.comics 230 net.games.pbm 228 net.abortion 132 net.sf-lovers 110 net.music 85 net.games.rogue 82 net.micro.pc 72 net.games.frp 60 net.religion.jewish 60 net.nlang.africa 50 net.invest 50 net.consumers 44 net.misc 44 net.astro 42 net.movies 42 net.games.board 40 net.tv 36 net.unix-wizards 35 net.arch 30 net.micro 25 net.kids 24 net.jokes.d 24 net.auto 20 net.origins The numbers are generated directly from the latest Top 25 Posters list. They are, for each newsgroup, give points to the newsgroup based on the posters' percentage of the network and that newsgroups percentage of that posters posting. Specifically, the newsgroup percentage is taken as is, and the posters percentage is multiplied by 10. I've appended the awk script used to the end of this article... Now the results aren't terribly unexpected. The top 10 are dominated by the sources groups and the talk-talk groups. But net.jokes is a bit of a surprise. In looking at the original data I see that one poster (about .6% of the net) posted soley to net.jokes. I don't know enough about statistics to know if this is really an anomoly, and if it is how to correct it. The first pass at the problem was simply the number of percentages per newsgroup, but after that it seemed obvious that it needed to be weighted some. What do people think???? -----------------------> Cut here <----------------------- # USAGE: awk -f a.awk <datafile | sort -n -r >reportfile # # Input format is # # <number-of-points> <newsgroup-name> # or # <Portion-of-net> # NF == 1 { size = $1 } NF == 2 { count[$2] += ($1 * size) } END { for (i in count) print count[i], i } -- David Herron, cbosgd!ukma!david, david@UKMA.BITNET. Experience is something you don't get until just after you need it.