[news.admin] Usenet volume

sl@van-bc.UUCP (pri=-10 Stuart Lynne) (12/06/88)

I was talking to a new news administrator today about the current news
volumes and rate of increase.

My gut feeling was that the news volume is doubling about every 16 months.

Anybody have any statistics they can work with to give us an accurate idea
of the rate of increase of volume. 

-- 
Stuart.Lynne@wimsey.bc.ca {ubc-cs,uunet}!van-bc!sl     Vancouver,BC,604-937-7532

karl@triceratops.cis.ohio-state.edu (Karl Kleinpaste) (12/06/88)

sl@van-bc.UUCP (Stuart Lynne) writes:
   I was talking to a new news administrator today about the current news
   volumes and rate of increase.
   My gut feeling was that the news volume is doubling about every 16 months.
   Anybody have any statistics they can work with to give us an accurate idea
   of the rate of increase of volume. 

From Spaf's notes from the talk he gave to the IETF folks a couple of
months ago, slide 7 has exactly this info:

			Traffic

Based on figures from R. Adams, H. Spencer, M. Horton,
S. Bellovin, and B. Reid:

o 1979: 3 sites, 2 articles per day
o 1980: 15 sites, 10 articles per day
o 1981: about 150 sites, 20 articles per day
o 1982: about 400 sites, 35 articles per day
o 1983: over 600 sites, 120 articles per day
o 1984: over 900 sites, 225 articles per day
o 1985: over 1300 sites, 375 articles per day, 1Mb+ per day
o 1986: over 2500 sites, 500 articles per day, 2Mb+ per day
o 1987: over 5000 sites, 1000 articles per day, 2.4Mb+ per day

No figures for 1988 on the slide, but you get the idea.

Awesome.  Truly awesome.

--Karl

lmb@vsi1.UUCP (Larry Blair) (12/07/88)

In article <1995@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
=I was talking to a new news administrator today about the current news
=volumes and rate of increase.
=
=My gut feeling was that the news volume is doubling about every 16 months.
=
=Anybody have any statistics they can work with to give us an accurate idea
=of the rate of increase of volume. 

Many sites in the Bay Area running weekly statistics.  Last January, the
average number of posting weekly were in the 7000-8000 range.  With the
exception of Tahnksgiving week, the last few weeks have all been over 16,000.
Last week saw > 17,000.  I ran partial statistics last week, as my inodes
rapidly dwindled, and found that we received over 13,000 postings from 9am
Tuesday through 4pm Thursday.

This presents a few problems:

It is almost impossible to take a full feed at 1200 baud.  A particularly
heavy day can take over 24 hours to receive.  At this rate, 2400 won't work
by 1990.

Large sites that are feed 20 or 30 other sites will have no choice but
to cut back.

As the number of postings grows, it becomes more and more difficult to
follow discussion lines.  One posting will quickly draw 50 responses.
Next year, it will be 100.

Inodes are going to become a big issue.  Many systems, like our Sun, are
particularly stingy with creating inodes.  It appears to me that the amount
of crossposting has increased, which, of course, uses more inodes.

I had an interesting conversation with a new site the other day.  We were
going to give him a full feed.  I started asking questions.  How much disk
space to you have allocated for news?  "55 MB."  Err... you may survive
for a little while with short cycle expires.  How many inodes are free?
"Oh, I've got tons.  18,000."  Better shorten that cycle some more.  Btw,
how much free space in the filesystem with /usr/spool/uucp?  "1.8 MB."
Uh oh, we'll have to play it by ear.

I expect that there will be 50 responses to this posting, many with
statements like, "I run with 20MB and 5,000 inodes."  Of course it is
possible, but not with standard software and default expires.
-- 
Larry Blair   ames!vsi1!lmb   lmb%vsi1.uucp@ames.arc.nasa.gov

lmb@vsi1.UUCP (Larry Blair) (12/07/88)

In article <1995@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
=I was talking to a new news administrator today about the current news
=volumes and rate of increase.
=
=My gut feeling was that the news volume is doubling about every 16 months.
=
=Anybody have any statistics they can work with to give us an accurate idea
=of the rate of increase of volume. 

Many sites in the Bay Area running weekly statistics.  Last January, the
average number of posting weekly were in the 7000-8000 range.  With the
exception of Tahnksgiving week, the last few weeks have all been over 16,000.
Last week saw > 17,000.  I ran partial statistics last week, as my inodes
rapidly dwindled, and found that we received over 13,000 postings from 9am
Tuesday through 4pm Thursday.

This presents a few problems:

It is almost impossible to take a full feed at 1200 baud.  A particularly
heavy day can take over 24 hours to receive.  At this rate, 2400 won't work
by 1990.

Large sites that are feed 20 or 30 other sites will have no choice but
to cut back.

As the number of postings grows, it becomes more and more difficult to
follow discussion lines.  One posting will quickly draw 50 responses.
Next year, it will be 100.

Inodes are going to become a big issue.  Many systems, like our Sun, are
particularly stingy with creating inodes.

I had an interesting conversation with a new site the other day.  We were
going to give him a full feed.  I started asking questions.  How much disk
space to you have allocated for news?  "55 MB."  Err... you may survive
for a little while with short cycle expires.  How many inodes are free?
"Oh, I've got tons.  18,000."  Better shorten that cycle some more.  Btw,
how much free space in the filesystem with /usr/spool/uucp?  "1.8 MB."
Uh oh, we'll have to play it by ear.

I expect that there will be 50 responses to this posting, many with
statements like, "I run with 20MB and 5,000 inodes."  Of course it is
possible, but not with standard software and default expires.
-- 
Larry Blair   ames!vsi1!lmb   lmb%vsi1.uucp@ames.arc.nasa.gov

jbuck@epimass.EPI.COM (Joe Buck) (12/07/88)

In article <1275@vsi1.UUCP> lmb@vsi1.UUCP (Larry Blair) writes:
>Many sites in the Bay Area running weekly statistics.  Last January, the
>average number of posting weekly were in the 7000-8000 range.  With the
>exception of Tahnksgiving week, the last few weeks have all been over 16,000.
>Last week saw > 17,000. 

Fortunately, average article size has been steadily decreasing as the
numbers have been growing (we are experiencing exponential growth, but
the exponent is smaller than article numbers would indicate).  In the
Old Days, people treated Usenet articles as a published medium.  Now
people just chat, posting lots of messages with no more than five or
six lines of original text.

>It is almost impossible to take a full feed at 1200 baud.  A particularly
>heavy day can take over 24 hours to receive.  At this rate, 2400 won't work
>by 1990.

Some have said that the technological innovations (compress, 2400 baud
modems, Trailblazers, NNTP) have allowed us to cope with the growth.
Unfortunately, it seems to me that they have been a big contribution
to causing the growth.

The state of flux that mail is in these days also contributes to the
problem; lots of users are posting messages intended for one person
because they aren't willing to take the time to figure out how to get
mail through.

>Inodes are going to become a big issue.  Many systems, like our Sun, are
>particularly stingy with creating inodes.  It appears to me that the amount
>of crossposting has increased, which, of course, uses more inodes.

Umm... there is only one inode per file.  Crossposting simply adds
more directory entries; every directory entry points to the same inode.

>I expect that there will be 50 responses to this posting, many with
>statements like, "I run with 20MB and 5,000 inodes."  Of course it is
>possible, but not with standard software and default expires.

Count off: one.... I have 50 Mb and 30K inodes, and do three expire
runs -- groups are kept for 6, 4, or 2 days depending on arbitrary
criteria that only I understand :-).  We run standard news 2.11.14B.
I doubt if most folks carry stuff for two-three weeks anymore.  It's
not terribly useful to keep that much anyway.



-- 
- Joe Buck	jbuck@epimass.epi.com, or uunet!epimass.epi.com!jbuck,
		or jbuck%epimass.epi.com@uunet.uu.net for old Arpa sites
I am of the opinion that my life belongs to the whole community, and as long
as I live it is my privilege to do for it whatever I can.  -- G. B. Shaw

mike@turing.unm.edu (Michael I. Bushnell) (12/07/88)

Every once in a while someone points out that usenet traffic has been
growing exponentially since its inception.  Then lots of people remark
that it won't ever get that bad that sites actually *stop* getting
news, and others announce the pending "crash" of the net as *everyone*
(especially major hubs) drop news.

Sigh.

Ever heard of equilibrium folks?  It will come to a point when new
sites roughly balance old sites.  I've never spoken to someone that
didn't like news connectivity, so I don't see fragmentation as a
possibility as long as we have at least as many sites as we do now.
Sure, people will drop news due to large volume, but don't forget that
the cause of this is all the new sites.  It balances out in a nice
equilibrium.  

--
Michael I. Bushnell         \  	  This above all; to thine own self be true
HASA - "A" division   GIG!   \    And it must follow, as the night the day,
mike@turing.unm.edu   	     /\	  Thou canst not be false to any man.
Numquam  Gloria Deo   	    /  \  Farewell:  my blessing season this in thee!

pete@octopus.UUCP (Pete Holzmann) (12/07/88)

Our stats: Running vanilla 2.14 news. 30MB spool partition. Two daily
expires, one at 2 days, one at 2 weeks for intensively-interesting stuff
to local news junkies. Occasional full-partition trouble requiring hand
file-deletes tells me we'll be going to 40-50MB Real Soon Now.

A thought: many moons ago, for a somewhat different reason, I suggested
that we need to consider more regionalization of the net. I wonder if
it might behoove us to figure out some such solution sooner (i.e. the
next few months) rather than later (i.e. after the net has crushed itself
under its own weight). To wit (a strawman proposal, to give a flavor of
what might be possible):

	- Improved separation of newsgroup namespace from distribution
		namespace, so that various regions could all have their
		own rec.taxes or comp.music.rock-n-roll discussions, without
		having lots of regional newsgroup names.

	- The news software (active file? sys file? new file?) would have
		a specific default distribution area for each group.
		It would be best to have a mapping to make installation
		easy (e.g. the file would have distributions like 'city',
		'province', 'region', 'continent', 'world' rather than
		'ba','ca','uswest','na','world'). The installer would
		define the locally applicable mapping.

	- To begin with, all groups could default as they do today.

	- Once in place, appropriate groups (high volume, lots of "experts"
		available everywhere, and/or little value to worldwide
		discussion) could be limited to regional discussion.

	- Presumably, a moderator would be available to promote articles
		to a wider audience whenever necessary. 

Well, hopefully that's enough to shoot some shotgun shells at. Seems to
me that something like this is a practical way to limit traffic without
losing the true value of the net. Sure, it's nice to see what people all
over the world think about my latest question, but a *few* responses is
plenty. With the net as big as it is, every question in many groups gets
*TOO MUCH* feedback. We need to limit the audience! My criteria for
audience-limiting algorithms includes:

	1) Discussion threads must make sense and be contiguous to all
		participants. In particular, this means that randomly
		dropping articles would be a Bad Thing.

	2) Low volume valuable discussions need and deserve widespread
		distribution. Other categories may not need widespread
		distribution. Conclusion: flexibility must be possible.

	3) I'm interested in lots of different things. Don't cut me off
		completely from all discussion in any newsgroup that I
		want to recieve and/or post to.

	4) Moderation, in moderation, is a Good Thing. There isn't enough
		Quality Volunteer Time to moderate the whole net. And I
		don't have the stomach to turn Usenet into a set of
		Compu$erve Forums (with paid moderators, etc).

	5) I *liked* the Good Old Days. All we need is a way to give 
		*everybody* the Good Old Days.

Enough already.
-- 
  OOO   __| ___      Peter Holzmann, Octopus Enterprises
 OOOOOOO___/ _______ USPS: 19611 La Mar Court, Cupertino, CA 95014
  OOOOO \___/        UUCP: {hpda,pyramid}!octopus!pete
___| \_____          Phone: 408/996-7746

lmb@vsi1.UUCP (Larry Blair) (12/08/88)

In article <2707@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
=In article <1275@vsi1.UUCP> lmb@vsi1.UUCP (Larry Blair) writes:
=>Inodes are going to become a big issue.  Many systems, like our Sun, are
=>particularly stingy with creating inodes.  It appears to me that the amount
=>of crossposting has increased, which, of course, uses more inodes.
=
=Umm... there is only one inode per file.  Crossposting simply adds
=more directory entries; every directory entry points to the same inode.

Touche', Joe.

Credit this to not using my mental resources.  Actually Karl Kleinpaste
noticed it first (about 20 minutes after I posted).  It always amazes me
how fast the news propagates, but how slow the cancels are.

Where was that paragraph on reviewing what I write before posting:-)?

-- 
Larry Blair   ames!vsi1!lmb   lmb%vsi1.uucp@ames.arc.nasa.gov

jfh@rpp386.Dallas.TX.US (The Beach Bum) (12/08/88)

In article <2707@epimass.EPI.COM> jbuck@epimass.EPI.COM (Joe Buck) writes:
>Fortunately, average article size has been steadily decreasing as the
>numbers have been growing (we are experiencing exponential growth, but
>the exponent is smaller than article numbers would indicate).  In the
>Old Days, people treated Usenet articles as a published medium.  Now
>people just chat, posting lots of messages with no more than five or
>six lines of original text.

There seem to be groups where this is true, and others which do not
experience this phenomenon.  Aside from periodic Henry-worship,
Unix-Wizards is a good example.

The worst problem facing USENET, IMHO, is the repeat business.  The
periodic posting of the same thread.  Perhaps it is time to expand
news.announce.newusers into every such newsgroup.  Regular monthly
introductory postings with the commonly asked questions for that group.

>>It is almost impossible to take a full feed at 1200 baud.  A particularly
>>heavy day can take over 24 hours to receive.  At this rate, 2400 won't work
>>by 1990.
>
>Some have said that the technological innovations (compress, 2400 baud
>modems, Trailblazers, NNTP) have allowed us to cope with the growth.
>Unfortunately, it seems to me that they have been a big contribution
>to causing the growth.

Agreed.  I would like to add that uunet, portal and PC Pursuit are also
contributors to the problem.  USENET is not as ``elite'' as it once was.
In all fairness, sites such as mine are also to blame - I feed 10 partials.
Who knows what those sites do.

>>I expect that there will be 50 responses to this posting, many with
>>statements like, "I run with 20MB and 5,000 inodes."  Of course it is
>>possible, but not with standard software and default expires.
>
>Count off: one.... I have 50 Mb and 30K inodes, and do three expire
>runs -- groups are kept for 6, 4, or 2 days depending on arbitrary
>criteria that only I understand :-).  We run standard news 2.11.14B.
>I doubt if most folks carry stuff for two-three weeks anymore.  It's
>not terribly useful to keep that much anyway.

Well, having professed my guilt earlier ...

I run 2.11.8 in 24MB.  I take alt, comp, misc, news, rec, sci, unix-pc,
pubnet, bionet, tx and dfw.  And then redistribute and expire it all.
Immediate expires for things I don't use, four days for everything else.
Get it in, unbatch, batch, and then expire.  Most of my sys lines are
fairly convoluted, here is an example: [ notice the One True Indenting
Style ;-) ]

void:world,na,usa,tx,dfw,\
	alt,\
	comp.databases,comp.lang.c,comp.laser,\
		comp.mail.uucp,\
	comp.sources,\
		!comp.sources.x,!comp.sources.atari.st,\
		comp.sources.misc,comp.sys.tandy,\
	comp.terminals,comp.unix,\
	rec.arts.startrek,rec.food,rec.humor,rec.humor.funny,\
	to.void:F:

I don't understand what Karl [ at osu-cis ] is having trouble with.
Unless I missed something [ which is possible, I just don't see what
is so horrible about 18 convoluted sys lines ]  My attitude is just
to get the feed I take in, and then let rnews chew on the articles,
and spit them back out.

I would like to get out of the news business to such an extent, but as
volume goes up, sites are less willing to add feeds.  And then someone
calls asking for a partial, and so on.

Two things need to be encouraged - more sites willing to shoulder more
of the load, i.e., fewer leafs per branch; and more moderation.  The
former will make it easier to get on the net, the later will make it
more worthwhile.
-- 
John F. Haugh II                        +-Cat of the Week:--------------_   /|-
VoiceNet: (214) 250-3311   Data: -6272  |Aren't you absolutely sick and \'o.O'
InterNet: jfh@rpp386.Dallas.TX.US       |tired of looking at these damn =(___)=
UucpNet : <backbone>!killer!rpp386!jfh  +things in everybody's .sig?-------U---

cks@ziebmef.uucp (Chris Siebenmann) (12/13/88)

In article <1275@vsi1.UUCP> lmb@vsi1.UUCP (Larry Blair) writes:
...
>Inodes are going to become a big issue.  Many systems, like our Sun, are
>particularly stingy with creating inodes.

 I suspect sites have already started to run out of inodes. We spent a
week here with 200 inodes free and more news wanting to come in (I
long ago wrote a program that blows off incoming uucico's if disk
blocks or inodes drop too low; this is why we had 200 instead of 0
inodes).

>I expect that there will be 50 responses to this posting, many with
>statements like, "I run with 20MB and 5,000 inodes."  Of course it is
>possible, but not with standard software and default expires.

 I suspect most everyone is running with non-standard expires by now.
What I hope we'll start seeing is twofold; news systems that are more
reliable when disks overflow, and utilities to choke things down and
keep them under control. Some programs are already out there, like
Cnews and Brad Templeton's space-based expiry system (I think he wrote
it; the article header seems to have vanished on my copy).

-- 
"Would that Aza Chorn had teleported Bates elsewhere and not removed
 so charming and preposterous a folly from our skyline ... but then he
 could not have known, not being raised around these parts."
Chris Siebenmann		uunet!utgpu!{ontmoh!moore,ncrcan}!ziebmef!cks
cks@ziebmef.UUCP	     or	.....!utgpu!{,ontmoh!,ncrcan!brambo!}cks

jrp@mirror.UUCP (John R. Petersen) (12/15/88)

The facts:

338.5 MB Spool Space.
Default expiration.  (one of the few on both counts I would think.)
7 full downstream news feeds, and 4 partial feeds.

The Result:

Spool partition is always at *least* 70% of capacity.
Current figures (within the last 2 weeks) are running about 80%.  
Some of my neighbors don't always pick up their stuff though....

							--John

----
John R. Petersen --  jrp@mirror.TMC.COM		[Systems Programmer]
        UUCP   :  {mit-eddie, pyramid, wjh12, xait, datacube}!mirror!jrp
Mirror Systems	2067 Massachusetts Avenue  Cambridge, MA, 02140
Telephone:	617-661-0777 extension 122
Administrator for ZONE1.COM - Info Requests: zone1-info@mirror.TMC.COM

"One damn minute, Admiral." -- Spock : ST IV The Voyage Home
---

mangler@cit-vax.Caltech.Edu (Don Speck) (12/19/88)

In article <2707@epimass.EPI.COM>, jbuck@epimass.EPI.COM (Joe Buck) writes:
> Fortunately, average article size has been steadily decreasing

The comp groups have a larger average article size than the others.
The non-comp groups are proliferating faster, pulling down the
overall average.  But partial-feed sites do not see this trend:
instead, they may see an *increase* in average size, as volume
forces them to drop rec, soc, and talk groups.

      average size  bytes   articles
gnu   --> 4928	     1.7%     0.8%
comp  --> 3211	    40.4%    29.1%
sci	  2093	     4.1%     4.5%
misc	  1733	     4.4%     5.8%
news	  1813	     1.4%     1.8%
rec	  1845	    26.5%    33.2%
soc	  2012	     9.3%    10.6%
talk	  2096	     7.8%     8.5%
alt	  1825	     3.8%     4.8%
ca	  1897	     0.7%     0.9%
TOTAL --> 2308	     100%     100%

(I modified "du" to report bytes instead of kilobytes, and ran it
on our feed's /usr/spool/news).

berleant@cs.utexas.edu (Dan Berleant) (12/21/88)

In article <1278@vsi1.UUCP> lmb@vsi1.UUCP (Larry Blair) writes:
>In article <1995@van-bc.UUCP> sl@van-bc.UUCP (pri=-10 Stuart Lynne) writes:
>=Anybody have any statistics they can work with to give us an accurate idea
>=of the rate of increase of volume. 
>
>Many sites in the Bay Area running weekly statistics.  Last January, the
>average number of posting weekly were in the 7000-8000 range.  With the
>exception of Tahnksgiving week, the last few weeks have all been over 16,000.
>Last week saw > 17,000.  I ran partial statistics last week, as my inodes
>rapidly dwindled, and found that we received over 13,000 postings from 9am
>Tuesday through 4pm Thursday.
>
>This presents a few problems: [...]

Looks like the future may hold exponential expansion. One problem is
volume in some newsgroups. It is going to become impossible to keep
up with the news in some groups, follow discussions, etc. Some say
it already had become impossible.

Is there any chance of dynamic creation (and deletion) of newsgroups?
I have in mind that anyone can create a newsgroup for the purpose
of carrying on a discussion of a certain topic. Then, newsgroups 
that do not have any postings for some length of time will be
automatically deleted by net wide rmgroup commands (or whatever).

Is this consistent with the social structure (?) of the usenet
newsgroup system, its projected future, and is it technically
feasible?

Dan
berleant@cs.utexas.edu

eric@snark.UUCP (Eric S. Raymond) (12/23/88)

In article <4409@cs.utexas.edu>, berleant@cs.utexas.edu (Dan Berleant) writes:
> Is there any chance of dynamic creation (and deletion) of newsgroups?
> I have in mind that anyone can create a newsgroup for the purpose
> of carrying on a discussion of a certain topic. Then, newsgroups 
> that do not have any postings for some length of time will be
> automatically deleted by net wide rmgroup commands (or whatever).

Not only is this possible, I have already implemented it! In 3.0 you can
set a FLEXGROUPS option on any newsgroup hierarchy (I have it set for alt.all
at my site) which works as you describe except that no rmgroup messages are
necessary for cleanup. Instead, inactive groups are simply expired after a
configured period of time.
 
> Is this consistent with the social structure (?) of the usenet
> newsgroup system, its projected future, and is it technically
> feasible?

I floated this proposal to the now-defunct backbone-admin list twice. The
response was cautious approval from a few, reflexive territorial-defensive
grunting from most, and excrement-throwing from another few. They didn't like
the idea of giving up "control of the namespace". Human beings are such
primates sometimes!

Like you, I believe it is vital for the future. The net is getting too big
for any set of 'first among equals' to control -- and as a libertarian
anarchist *I* think this is a Good Thing!
-- 
      Eric S. Raymond                     (the mad mastermind of TMN-Netnews)
      Email: eric@snark.uu.net                       CompuServe: [72037,2306]
      Post: 22 S. Warren Avenue, Malvern, PA 19355      Phone: (215)-296-5718