[news.groups] Newsgroup similarity list

packer@amarna.gsfc.nasa.gov (Charles Packer) (02/23/90)

The latter part of this message consists of a list of the most
popular 120 newsgroups ordered so that similar newsgroups are
adjacent to each other. Similarity here means the likelihood
that they have the same people posting to them. This list was
created as follows: 

1. Define popularity of each newsgroup to be the number of
individuals posting to it during some interval (in this case, a
two-week period last December). Sort a list of all newsgroups
descending on this count. Select the top 120 newsgroups. 

2. Make a list of all individuals posting to the selected
newsgroups. Sort this list descending on number of newsgroups to
which each posted. Select the top 1000 "most widely posting"

3. Build a matrix in which columns are newsgroups, rows are
individuals. A one in cell (j,k) means that individual j
has posted to newsgroup k. A zero means no posting occurred.
The matrix will have 120 columns, 1000 rows. 

4. Reorder the columns and rows to bring as many of the non-zero cells
close to the diagonal as possible. This has the effect of bring
similar columns close to each other, and similar rows close to
each other.

The list follows. The number to the left of a newsgroup is its
column position. The number to the right is how many of the 1000
individuals posted to it. The matrix itself, aggregated into a
120 column print file, 100 lines long, is available from me by

    1 soc.culture.korean     18           61 alt.activism           40
    2 soc.culture.china      22           62 alt.drugs              39
    3 soc.culture.japan      28           63 sci.environment        46
    4 soc.culture.indian     16           64 rec.backcountry        25
    5 soc.culture.taiwan     12           65 rec.games.frp          29
    6 rec.music.gdead        16           66 rec.video              49
    7 rec.sport.baseball     31           67 misc.legal             94
    8 rec.sport.basketbal    39           68 rec.motorcycles        29
    9 rec.sport.football     70           69 rec.autos             125
   10 rec.sport.misc         26           70 misc.consumers        102
   11 rec.sport.hockey       18           71 misc.misc              25
   12 rec.bicycles           17           72 rec.autos.tech         56
   13 rec.gambling           15           73 sci.med                36
   14 rec.skiing             25           74 alt.folklore.comput   123
   15 talk.politics.midea    21           75 misc.consumers.hous    24
   16 rec.music.bluenote     18           76 news.groups           100
   17 rec.arts.tv.soaps      27           77 alt.callahans          23
   18 rec.puzzles            19           78 rec.audio              69
   19 alt.rock-n-roll.met    15           79 rec.photo              23
   20 rec.music.folk         19           80 misc.invest            26
   21 rec.arts.tv.uk         26           81 rec.aviation           34
   22 rec.arts.drwho         31           82 sci.math               18
   23 rec.games.misc         26           83 sci.electronics        40
   24 alt.rock-n-roll        36           84 rec.ham-radio          53
   25 rec.music.misc         72           85 rec.radio.shortwave    28
   26 rec.music.cd           49           86 sci.space.shuttle      27
   27 rec.arts.anime         25           87 sci.space              43
   28 rec.arts.tv           113           88 sci.physics            33
   29 alt.cult-movies        26           89 sci.astro              24
   30 rec.arts.movies       129           90 news.admin             42
   31 rec.arts.comics        57           91 news.misc              28
   32 alt.romance            24           92 comp.protocols.tcp-    29
   33 alt.peeves             27           93 comp.misc              51
   34 rec.arts.sf-lovers    115           94 comp.sys.amiga         79
   35 rec.arts.books         45           95 comp.sys.mac          147
   36 talk.abortion          14           96 comp.sys.ibm.pc       148
   37 rec.music.classical    34           97 misc.wanted            28
   38 rec.games.video        36           98 comp.sys.next          39
   39 rec.arts.startrek      91           99 rec.music.makers       29
   40 alt.sex.bondage        23          100 alt.religion.comput    34
   41 rec.humor             112          101 comp.sys.mac.progra    28
   42 alt.sex               108          102 misc.forsale           50
   43 soc.motss              52          103 comp.sys.mac.hardwa    35
   44 misc.kids              29          104 rec.music.synth        28
   45 rec.pets               38          105 gnu.misc.discuss       31
   46 soc.singles            58          106 comp.arch              24
   47 rec.food.cooking       43          107 comp.sys.atari.st      31
   48 soc.men                61          108 comp.lang.c            63
   49 soc.women              84          109 comp.unix.questions    56
   50 rec.food.veg           36          110 comp.unix.wizards      47
   51 talk.religion.newag    25          111 comp.binaries.ibm.p    24
   52 talk.religion.misc     39          112 comp.sources.wanted    47
   53 alt.flame              51          113 comp.os.vms            19
   54 talk.bizarre           56          114 comp.unix.xenix        34
   55 sci.skeptic            30          115 comp.unix.i386         44
   56 talk.politics.theor    23          116 comp.windows.x         34
   57 rec.travel             36          117 comp.text              22
   58 talk.politics.misc    136          118 comp.sys.amiga.tech    16
   59 misc.headlines         98          119 comp.sys.hp            20
   60 talk.politics.guns     50          120 comp.sys.apple         13