[news.software.b] Monitoring group volume and popularity

brad@looking.on.ca (Brad Templeton) (09/07/90)

In article <1990Sep7.101916.1849@bhpcpd.kembla.oz.au> bernd@bhpcpd.kembla.oz.au (Bernd Wechner) writes:
>Specifically it should produce a log (perhaps daily or weekly) of 
>each newsgroup and how much disk space it is using (preferably sorted by
>size), and how many people are known to be reading that group. This
>information would be very helpful in deciding the expire times or whether
>we should stop receiving a group alltogether (if it is very large and
>unpopular say).

Well, for the first one, "du | sort -n" with a few options does the
trick.

For the latter, a simple report can be made by the arbitron program, or
for a more detailed report, the "arbit" program can be used.  The source
for arbit is part of the dynafeed package in uunet:~/ClariNet/dynafeed.tar.Z

Arbit will tell you counts of who is reading what, who is subscribing to
what, how long it's been since the group got a message, and how many
users & sites you feed are subscribing to a group.   Arbit also produces
arbitron style output, or output suitable for sending to your feed site
to control your subscription group-by-group.

You can mail comments and bug reports to me.  I will send the system
to comp.sources.misc soon.
-- 
Brad Templeton, ClariNet Communications Corp. -- Waterloo, Ontario 519/884-7473

bernd@bhpcpd.kembla.oz.au (Bernd Wechner) (09/07/90)

Considering the volume of news we have to deal with at the moment and
the shortage of disk space on which to store it, I would like to know if
there exists any software which I can run on a regular basis to monitor
the volume of individual newsgroups and/or how popular they are.

Specifically it should produce a log (perhaps daily or weekly) of 
each newsgroup and how much disk space it is using (preferably sorted by
size), and how many people are known to be reading that group. This
information would be very helpful in deciding the expire times or whether
we should stop receiving a group alltogether (if it is very large and
unpopular say).

I have written a shell script which will do part of the job for me, but
its not as nice as it might be and I can see no point in improving it if
such software already exists.

-- 
Bernd Wechner, Research Officer                     (bernd@bhpcpd.kembla.oz.au)
BHP Coated Products Division, Research and Technology Centre
Port Kembla, New South Wales, Australia.

chip@chinacat.Unicom.COM (Chip Rosenthal) (09/07/90)

In article <1990Sep7.101916.1849@bhpcpd.kembla.oz.au>
	bernd@bhpcpd.kembla.oz.au (Bernd Wechner) writes:
>Specifically it should produce a log (perhaps daily or weekly) of 
>each newsgroup and how much disk space it is using (preferably sorted by
>size), and how many people are known to be reading that group.

Nightly, I run an "ngsizes" report which gives me not only this information,
but also a breakdown of usage in the newsgroup by age.  Here's a sample:

  +---------------------------------------------------------------------------
  | newsgroup              read  0days  1days  3days  5days  7days 15days
  | rec.arts.movies           1   1070    550      4      4      4      4
  | rec.arts.sf-lovers        1    846    544     30      0      0      0
  | news.groups               1    834    198      0      0      0      0
  | news.lists                1    732     58      0      0      0      0
  | news.announce.newusers    3    700      0      0      0      0      0
  +---------------------------------------------------------------------------

i.e. "rec.arts.movies" has one reader and uses 1070 disk blocks.  Of that
1070, 550 are from articles >=1 day old, and 4 are from articles >=15 days.

The "ngsizes" script uses a "du" reimplimentation I wrote.  I originally
wrote it to add some features to du, such as the breakdown by age and
the ability to not accumulate subdirectory usage (i.e. don't count
alt/sources/d's usage in alt/sources).

An unexpected result of this is that my "du" is not only significantly
faster than a couple of standard du's I looked at, it also fixed some
bugs.  In particular, I looked at SCO XENIX 2.3 and ISC UNIX 2.0.2.  Both
of these reported wrong results for directories with very large files.

Drop me a line if you are interested in this stuff.  If there are enough
requests, I'll post.  (Note - this "du" runs in a SysVish environment.
For example, you must have an statfs(2).)

-- 
Chip Rosenthal  <chip@chinacat.Unicom.COM>
Unicom Systems Development, 512-482-8260 
Our motto is:  We never say, "But it works with DOS."

chip@chinacat.Unicom.COM (Chip Rosenthal) (09/11/90)

In article <1559@chinacat.Unicom.COM> I wrote:
>Nightly, I run an "ngsizes" report which gives me not only this information,
>but also a breakdown of usage in the newsgroup by age. [...]
>Drop me a line if you are interested in this stuff.

I've received a lot of requests for this.  I sent it into comp.sources.misc,
so watch there.  I tried to ACK all the messages I received, but a few
bounced due to munged headers.  Apologies if you mailed and didn't receive
a reply from me.

-- 
Chip Rosenthal  <chip@chinacat.Unicom.COM>
Unicom Systems Development, 512-482-8260 
Our motto is:  We never say, "But it works with DOS."