[news.admin] Daily news volume disagrees w/uunet stats

gaf@uucs1.UUCP (gaf) (07/11/89)

We're a bit new to Usenet (just sent out the map entry last week), but
our daily average volume of news seems much higher than I would expect.
I've been reading news.lists from another site for some months, and
the daily average at uunet always seems to be 4 - 5 Mb per day.

We're deliberately not receiving the soc, talk, and sci groups, which
together should account for about 1/3 of the total volume.  By my figuring
we should be getting 3 - 4 Mb a day.  Instead we're getting around 6 Mb!
That's even higher than the uunet average for all groups!

The cross-posted articles do seem to be linked to one another, so it's not
that.  To expire articles, I run "find ... -exec rm ..." to make sure I
get everything, regardless of posted expiration dates.

Is the standard deviation just so large that the average doesn't mean
much?  Certainly a posting like MGR skews things temporarily, but last
week was lighter than normal anyway (being a 3 day week many places). 
Is something out of whack in my news software?  Do your averages look
more like mine or uunet's?

-- 
Guy Finney					It's that feeling of deja-vu
UUCS inc.   Phoenix, Az				all over again.
ncar!noao!asuvax!hrc!uucs1!gaf	sun!sunburn!gtx!uucs1!gaf

grady@semprini.fx.com (Steven Grady) (07/11/89)

In article <123@uucs1.UUCP> gaf@uucs1.UUCP () writes:
>I've been reading news.lists from another site for some months, and
>the daily average at uunet always seems to be 4 - 5 Mb per day.
>
>By my figuring
>we should be getting 3 - 4 Mb a day.  Instead we're getting around 6 Mb!
>That's even higher than the uunet average for all groups!
>
>Do your averages look more like mine or uunet's?

Ours look like yours.  I just did a du on /usr/spool/news, and the
total was about 50 megabytes.  We receive all the standard groups
(including alt, gnu, etc), but no weird ones (biz, bionet, etc).
We expire all groups at 7 days.  So that comes out to about 7 megs/day.

I just switched to C news a week ago, but I did a du before switching,
and the average was about 7 megs/day for B news as well.  I currently run
superkludge, so it's not comp.mail.maps hosing things (it's du totals
about 2.5 megs).  Local newsgroups (ba.*, ca.*, etc) total about 1.5
megs.

I watched the output of du, and almost uniformly each top-level
directory (comp, rec, sci, soc, talk, etc) had about 3/4 the amount
of news that uunet had, for half as much time.  This comes out to
50% more space than expected from uunet's lists.

Anyone know what would cause this?  Problems with uunet?
Longer Path lines?  LLIF?

	Steven
	...!ucbvax!grady
	grady@postgres.berkeley.edu

"It's hard to get a refund when the salesman is sniffing
your crotch and baying at the moon..."

pmb@donk.UUCP (pmb) (07/11/89)

In article <1989Jul10.223958.28264@fxgrp.fx.com> grady@fxgrp.fx.com (Steven Grady) writes:
>I watched the output of du, and almost uniformly each top-level
>directory (comp, rec, sci, soc, talk, etc) had about 3/4 the amount
>of news that uunet had, for half as much time.  This comes out to
>50% more space than expected from uunet's lists.

This is speculation, but uunet may well be measuring the volume of news by
actually counting bytes, and not by doing a du.  (Du counts disk blocks; a
block with one byte on it will be counted exactly the same as one that's
full.  And since most blocks aren't full, this number will always be higher
than the true amount of news that's there.)  

mday@ohs.UUCP (Matthew T. Day) (07/12/89)

From article <123@uucs1.UUCP>, by gaf@uucs1.UUCP (gaf):
> .....  To expire articles, I run "find ... -exec rm ..." to make sure I
> get everything, regardless of posted expiration dates.

Try using "expire -I -e .." to override the "Expire:" field in some articles.
That way you keep an up-to-date history database.
-- 
+----------------------------------------------+-------------------------+
| Matthew T. Day, Orem High School, Orem, Utah | "He who laughs, lasts." |
| ..!uunet!iconsys!ohs!mday (mday@ohs.UUCP)    |    day++, dollar++;     |
+----------------------------------------------+-------------------------+

rfarris@serene.UUCP (Rick Farris) (07/12/89)

In article <1989Jul10.223958.28264@fxgrp.fx.com> grady@fxgrp.fx.com (Steven Grady) writes:
In article <123@uucs1.UUCP> gaf@uucs1.UUCP () writes:
> >I've been reading news.lists from another site for some months, and
> >the daily average at uunet always seems to be 4 - 5 Mb per day.

> This comes out to 50% more space than expected from uunet's lists.

Ok, guys, time for the thinking hats! :-)

Do you suppose that the figures that uunet reports are for
*compressed* news?


Rick Farris   RF Engineering  POB M  Del Mar, CA  92014   voice (619) 259-6793
rfarris@serene.uu.net      ...!uunet!serene!rfarris       serene.UUCP 259-7757

henry@utzoo.uucp (Henry Spencer) (07/12/89)

In article <370@ohs.UUCP> mday@ohs.UUCP (Matthew T. Day) writes:
>Try using "expire -I -e .." to override the "Expire:" field in some articles.
>That way you keep an up-to-date history database.

Or you can run C News expire (which works with B News with relatively little
work), which lets you send bounds on expiry dates.
-- 
$10 million equals 18 PM       |     Henry Spencer at U of Toronto Zoology
(Pentagon-Minutes). -Tom Neff  | uunet!attcan!utzoo!henry henry@zoo.toronto.edu

bill@twwells.com (T. William Wells) (07/14/89)

In article <1989Jul10.223958.28264@fxgrp.fx.com> grady@fxgrp.fx.com (Steven Grady) writes:
: I watched the output of du, and almost uniformly each top-level
: directory (comp, rec, sci, soc, talk, etc) had about 3/4 the amount
: of news that uunet had, for half as much time.  This comes out to
: 50% more space than expected from uunet's lists.
:
: Anyone know what would cause this?  Problems with uunet?
: Longer Path lines?  LLIF?

I ran some stats on my spool partition. I took the articles and
figured out the ratio of file size in the spool partition to the size
of the incoming data. This depends on the size of the allocation unit
for your disk. Here is the result:

	block size      percent over incoming
	512             14
	1024            27
	2048            51

---
Bill                    { uunet | novavax | ankh | sunvice } !twwells!bill
bill@twwells.com

rick@uunet.UU.NET (Rick Adams) (07/14/89)

The uunet stats only count the body of the article, not the headers.

Whether this is the correct way or not, its what was finally
decided after much argument several years ago. No one could
agree on what was 'right'.


I think of it as the amount of information being transferred rather
than the amount of disk space.

gaf@uucs1.UUCP (gaf) (07/15/89)

In article <780@serene.UUCP> rfarris@serene.UUNET (Rick Farris) writes:
>
>Do you suppose that the figures that uunet reports are for
>*compressed* news?

How about it, uunet?  Any light you can shed would be welcome.
-- 
Guy Finney					It's that feeling of deja-vu
UUCS inc.   Phoenix, Az				all over again.
ncar!noao!asuvax!hrc!uucs1!gaf	sun!sunburn!gtx!uucs1!gaf

gaf@uucs1.UUCP (gaf) (07/15/89)

In article <60408@uunet.UU.NET> rick@uunet.UU.NET (Rick Adams) writes:

>The uunet stats only count the body of the article, not the headers.

Looks like the "problem", then, is partially filled blocks, since the
headers don't (do they? naw.) account for half of all the text in the
average article.  Thanks to all who contributed wisdom toward this.
-- 
Guy Finney					It's that feeling of deja-vu
UUCS inc.   Phoenix, Az				all over again.
ncar!noao!asuvax!hrc!uucs1!gaf	sun!sunburn!gtx!uucs1!gaf

john@frog.UUCP (John Woods) (07/16/89)

In article <60408@uunet.UU.NET>, rick@uunet.UU.NET (Rick Adams) writes:
> The uunet stats only count the body of the article, not the headers.
> I think of it as the amount of information being transferred rather
> than the amount of disk space.

In that case, you're high by several megabytes per day...

-- 
John Woods, Charles River Data Systems, Framingham MA, (508) 626-1101
...!decvax!frog!john, john@frog.UUCP, ...!mit-eddie!jfw, jfw@eddie.mit.edu
    People...How you gonna FIGURE 'em?
    Don't bother, S.L.--Just stand back and enjoy the EVOLUTIONARY PROCESS...

rick@uunet.UU.NET (Rick Adams) (07/17/89)

Counting partially filled blocks is useless since there are
so many different block sizes. However, here are the
numbers including headers.

---rick

35031 articles, totalling 63.433438 Mbytes (77.903232 including headers),
were submitted from 4325 different Usenet sites by 10461 different
users to 553 different newsgroups for an average of 4.530960 Mbytes
(5.564517 including headers) per day.

				  Article	  	  Total
	Category	Count	  Mbytes	Percent	  Mbytes
	comp		  10034	 24.611183	 38%	 28.753679
	rec		  12349	 16.914505	 26%	 21.995065
	soc		   5186	  8.785193	 13%	 11.078640
	talk		   3509	  6.955401	 10%	  8.567632
	misc		   2547	  3.734556	  5%	  4.816018
	alt		   2203	  3.528812	  5%	  4.486699
	sci		   1721	  3.361460	  5%	  4.098219
	news		    802	  1.374312	  2%	  1.729883
	gnu		    472	  1.202586	  1%	  1.388820
	unix-pc		    128	  0.322350	  0%	  0.379225
	bionet		    119	  0.242983	  0%	  0.295535
	pubnet		     10	  0.041490	  0%	  0.045668
	u3b		      4	  0.007387	  0%	  0.008814

henigan@quando.UUCP (Kevin Henigan) (07/28/89)

In article <3330@titan.camcon.co.uk> igp@camcon.co.uk (Ian Phillipps) writes:
>
>tit%67 wc -c * | tail -1
>  163166 total
>tit%68 du
>  190     .
>
>Maybe this has something to do with it. (contents of .../news/admin)
>Most news items are quite short, so the part-block wastage (+directory blocks)
>will add to the "du" figures.
>-- 
>UUCP:  igp@camcon.co.uk   | Cambridge Consultants Ltd  |  Ian Phillipps
>or:    igp@camcon.uucp    | Science Park, Milton Road  |-----------------
>Phone: +44 223 420024     | Cambridge CB4 4DW, England |

In the att man page for du it says :-

	  Du gives the number of kilobytes contained in	all files and
	  ...
	  Du is	normally silent	about directories that cannot be read,
	  files	that cannot be opened, etc.
	  ...
	  A file with two or more links	is only	counted	once.

So providing it is run as the news superuser and can read all the
directories, no problems but at the bottom of the man page..

     BUGS
	  If there are too many	distinct linked	files, du will count
	  the excess files more	than once.

With the amount of cross-posting that goes on in news, du will ALWAYS
give a wrong count, ALWAYS more than what is there..

--
 Kevin Henigan. UUCP:     {backbone}!unido!quando!henigan OR henigan@quando.uucp
  Quantum GmbH  Bitnet:   UNIDO!quando!henigan OR henigan%quando@UNIDO(.bitnet)
    Dortmund    internet: henigan%quando%mcvax.UUCP@cwi.nl
    Germany     internet: henigan%quando%UNIDO.bitnet@mitvma.mit.edu