[net.news.adm] compressing news articles

seifert@hammer.UUCP (Snoopy) (05/31/86)

In article <693@bu-cs.UUCP> bzs@bu-cs.UUCP (Barry Shein) writes:
>
>[case where disk space is the ultimate limiting factor...]
>
>How about just compressing the rarely read newsgroups and then teaching
>your news readers to recognize these (.Z or maybe there's a magic number)
>and zcat'ing or pcat'ing them or whatever you use.

Good idea, Barry.  I would like to propose the following:

Teach the newsreading programs (rn, vnews, etc) to recognise a file
suffix (.Z ?).

Then implement a dynamic two-stage expire:  if noone reads net.foobar,
immediately compress all articles, then delete them after, say 2 days.
Then someone resubscribing would have the last couple days worth of
stuff to read, it would just take longer to get them displayed.
If they stayed subscribed, the news cron-job would notice, and
switch that group to the other method.  The method for groups which
are read is: store articles uncompressed for 4-5 days, then when
most users have read them, compress the articles and remove them
after 2-3 weeks (or whatever disk space allows).

There would be fudge-factors to allow the newsadmin to keep mod.wonderful
longer than talk.tv.I_Love_Lucy.episode_23. 

There would be a news cron-job fired-off once a day during
a time of light system load.  It would read all the .newsrc
files with a modification date more recent than a few days old
(say 4 days).  It would figure out which groups are read and
which aren't.  It then stores this information in a file
somewhere.  When news comes in, this file tells which groups
to compress.

EXAMPLE   The newsadmin file might look like:

mod.announce:		8	40	#very important, small volume
mod.sources:		4	14	#useful, high volume
mod.std.unix:		4	20	#useful, med volume
mod.pdp8.cobol.src:	4	20	#useful, low volume
talk.garbage:		2	10	#unimportant, high volume
			^	^
		     compress	rm
		  after # days  after # days

Then the cron job looks at this file, and at which groups are
actually read, and produces:

mod.announce:		8	40
mod.sources:		4	14
mod.std.unix:		4	20
mod.pdp8.cobol.src:	0	2	(noone at this site reads it)
talk.garbage:		2	10
			^	^
		     compress	rm
		  after # days  after # days

There should also be some way to allow for variations in daily
news input, so that articles can be kept longer if nothing's
coming in.  This will be somewhat less important if we go to
moderated groups for most things, as the moderators can do
some flow-control, but there is still the problem of broken
links causing nothing for days followed by a flash-flood.

Snoopy
tektronix!tekecs!doghouse.TEK!snoopy