[net.news] Quantitative Gloom and Doom

fair@ucbvax.ARPA (Erik E. Fair) (09/06/85)

I hate reading predictions of gloom, doom, and ultimate disaster for
the USENET. Lauren Weinstein and Chuq Von Rospach have been doomsaying
quite a bit lately (at least, this is how I perceive their articles of
late), so I decided to get some hard numbers to refute them. I like
numbers, because, among other things, you can't argue with them.
Unfortunately, the numbers did not show what I had expected...

Disk Space Usage Summary in /usr/spool/news
on ucbvax by newsgroup (1024 byte/blocks):

International Newsgroups:
38939	net
1575	mod
791	fa
193	control
13	junk
-----
41511

National/State/Regional Newsgroups:
415	ucb
430	ba
13	uc
26	ca
54	na
---
938

Grand Total: 42449 blocks used for netnews article storage

Now an important number for context: ucbvax's netnews system keeps
articles online for 28 days (4*WEEK). Given that, we can get:

average number of blocks arriving daily: 1516	(~ 1.5 Megabytes)

number articles in the history file in net,fa,mod: 15896

subtotal of disk blocks in net,fa,mod: 41305

average number of bytes per article in net,fa,mod: 2660

average number of articles arriving daily: 567

Now some history. I did some similar counting in June 1984 on another
system (dual), and got two numbers that I thought were interesting:

		10,000 articles per month
		1600 bytes per article
therefore	16 Megabytes/month

Note: the netnews system on dual only kept netnews for two weeks,
so my sample then was smaller. However, from these numbers we get

66% increase in the average size of articles
58% increase in the number of articles per month

This, in a period of 14 months.

What can we do to save ourselves from immanent information overload?
An answer (mine, of course) will be forthcoming in a series of articles
in this newsgroup.

	keeper of the network news for ucbvax,
		and guardian of the gateway,

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

jerry@oliveb.UUCP (Jerry Aguirre) (09/21/85)

> Disk Space Usage Summary in /usr/spool/news
> on ucbvax by newsgroup (1024 byte/blocks):
> 
> International Newsgroups:
> 38939	net
> 1575	mod
> 791	fa
> 193	control
> 13	junk
> -----
> 41511
> 	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

Be carefull of how you make the disk usage measurements.  The "du"
program lies.  I have had it report greater disk usage than the size of
the disk.  There are two sources of errors, "sparse" files and links.

If a file has large areas that were seeked over but never written then
those blocks that would be zero are not allocated but are marked in a
way that later reads can recognize.  The read then simulates the reading
of a block of zeros.  An example of this is the DBM history database
used by news.  Du calculates blocks by dividing the file size (logical)
by the size of the blocks, not always accurate!

The second source of error, which is more relevent here, is the counting
of links.  The storage of news makes use of links to cross post an
article to more than one news group.  An article may appear in both
net.flame and net.sources but only one copy is stored.  The du program
will count each link seperatly.  This usually doubles du's estimate of
usage though the exact ammount depends on the quantity and size of cross
postings.

Both these problems seem to have been fixed in the 4.2BSD version so
whether they affect you depends on what version of Unix you run.  The
link problem can still bite you if you manually add the sizes.

My news directory runs about 13Mb with a 2 week expiration so your size
seems a little high.  Either you receive lots of articles that I don't,
you are getting lots of duplicate articles, or your expire is not deleting
every thing it should.  For 4 weeks you should have under 26Mb, not 41Mb.

The current news history code is a crock and has many loopholes that
allow an article to be received without being entered into the database.
This means that they don't get deleted and that duplicates can be
received.  I had this bite me while testing out a batched version of the
ihave/sendme protocol.  The target system requested several thousand
articles that were not in it's history.  They were, however, already in
the news directory.  Also, unless I regularly (weekly) run an expire -r
then unexpired articles gradually accumulate.

					Jerry Aguirre @ Olivetti ATC
{hplabs|fortune|idi|ihnp4|tolerant|allegra|tymix|olhqma}!oliveb!jerry

usenet@ucbvax.ARPA (USENET News Administration) (09/24/85)

Of course, ucbvax runs 4.3 BSD, with (as you noted Jerry) a du that
knows how to count links correctly, so that is not a source of error in
my numbers. Also, there are no sparse files in the netnews spool area,
since everything in there is a contiguous text file.

As to your comments about expire, I have had no problems here. I have a
shell script that looks for netnews articles that are greater than 35
days old which I then check by hand for an Expires: field. To date,
expire has missed on the order of 30 articles in a period of three
months, which is an acceptable error rate for me (since I don't really
want to hack up expire.c).

Don't ask me why we seem to have twice as much netnews online as you
expected. I was merely counting, and reporting the results, because I
think they indicate a trend toward both increasing traffic, and
increasing size of the individual articles.

	Erik E. Fair	ucbvax!fair	fair@ucbarpa.BERKELEY.EDU

henry@utzoo.UUCP (Henry Spencer) (09/26/85)

> The second source of error, which is more relevent here, is the counting
> of links.  The storage of news makes use of links to cross post an
> article to more than one news group.  An article may appear in both
> net.flame and net.sources but only one copy is stored.  The du program
> will count each link seperatly...

Any du that does this is broken.  Please note that if it is, it's because
somebody at Berkeley or AT&T or XYZ Vaporboxes Inc. broke it.  The original
Bell Labs du on V7 copes with links properly.
-- 
				Henry Spencer @ U of Toronto Zoology
				{allegra,ihnp4,linus,decvax}!utzoo!henry

mikel@codas.UUCP (Mikel Manitius) (10/03/85)

> > The second source of error, which is more relevent here, is the counting
> > of links.  The storage of news makes use of links to cross post an
> > article to more than one news group.  An article may appear in both
> > net.flame and net.sources but only one copy is stored.  The du program
> > will count each link seperatly...
> 
> Any du that does this is broken.  Please note that if it is, it's because
> somebody at Berkeley or AT&T or XYZ Vaporboxes Inc. broke it.  The original
> Bell Labs du on V7 copes with links properly.
> -- 
> 				Henry Spencer @ U of Toronto Zoology
> 				{allegra,ihnp4,linus,decvax}!utzoo!henry

Really? I remember well on my old 4.1bsd system, du kept a table of inodes
it had seen before and did not count files it had seen again, and I can
confirm this with a test I just did on our System V (5.2), I created a
file of X size, linked it several times, and du reported X size, I made
a subdirectory and linked it in there too, du reported X+1 size, the extra
1 block comming from the allocation for the dirrectory. I see no problem
with du.
-- 
                                        =======
     Mikel Manitius                   ==----=====    AT&T
     ...!{ihnp4!}codas!mikel         ==------=====   Information Systems 
     (305) 869-2462                  ===----======   SDSS Regional Support
     AT&T-IS ETN: 755                 ===========    Altamonte Springs, FL
     My opinions are my own.            =======

rick@seismo.CSS.GOV (Rick Adams) (10/10/85)

Why doesn't anyone look at the code for du before spouting off on what
it does or doesn't do?

On 4.[12]bsd, du keeps track of the first 1000 inodes it finds with
multiple links.  If it finds more than that, it will not notice that
they are duplicates and will count them as if they were distinct files.

My system has over 7000 articles on line right now. About 1100 of
them are multiple links. So 100 articles are being miscounted and du is wrong
in this case.

If you have 15000 articles (about 1 months worth), then du will probably
miss at least 1000 of them and be WAY off.

---rick