[net.sources] Another Expiration script, min spave vs time

joe@auspyr.UUCP (Joe Angelo) (01/14/87)

Okay, I think maybe I am on the proper approach to expiring news...

My problem is that I like to keep news around for as long as possible,
and only expire it when disk space warrents it. Manually expiring news
isn't much of a hassle, besides, this way I get to watch what the program
does -- thus saving me a weekly movie ticket.

Anyways, I've never been able to figure out just HOW many days past should
be expired and still keep about 3 meg of news around... As with all unix
problems, I created a shell script.

The attached shell script will look at your news spool dir and figure
out how many bytes it recieved on each day, then it compares that with
a normal-queue-size and tells you how many days should be expired in 
order to return your queue to a normal size. Did I explain that properly?

Ahh, yeah, sounds right to me. Anyways, the attached script doesn't run
expire and does nothing more then report, but it can be adapted to
generate an expire command line (with a pipe to /bin/sh).

The shell script is also rather primative and UGLY -- but it seams to work
out fairly well. I've testing on a Pyramid (UCB mode) and all works okay,
and I see no reason why it won't work on other UNIX's.

If anyone has another approach please let me know as I would love to
see how others solve the same problem.

OH!! 
It should be noted that dates reported by ''ls'' and ''date'' are
converted into a julian number so there is some easy way to do math
on dates. For this reason, Dec 23 will be converted to three hundred
and something. If the script tells you to expire news  over three hundred
and something days, then your threshold was found to be in December;
Otherwise, if nobody ''touch''s the news files, and if expire really
works on the TRUE date the files was created on your local system,
(ps: does it??) then everything is ok.


-- cut here with hack saw -- warning: slicing a crt is not healty --

trap 'echo Aborting ; rm -f /tmp/chl$$ ; exit ' 1 2 3 4 10 11 15

# SPOOL is the name of the news spool directory where news files are found
# ... it should contain the root name of just the spool dir and nothing
# ... else; otherwise, set spool as SPOOL="dir1 dir2 dir3" for multidir
# ... news spoolers.
#
# NORMALBYTES is the size in bytes that the spool directory should
# ... normally be -- this does not include the size of directories
SPOOL=/usr/spool/news
NORMALBYTES=2000000

# just a note:
#
# expect output from 'ls -r' to be ...
#
# drwxr-xr-x 18 joe      4096 Jan 12 15:26 ./
#
#	day is $6
#	month is $5
#	size is $4

tday=`date | cut -d' ' -f3`
tmon=`date | cut -d' ' -f2`

#
# sed pipeline is used below to add an extra space between permissions
# and number of links, BSD-ls can run the two together, eg we want
#	drwxr-xr-x107 joe      4096 Jan  6 18:18 src/
# converted to:
#	drwxr-xr-x 107 joe     4096 Jan  6 18:18 src/
# sed pipeline also converts tabs to spaces, be sure you have
# a REAL tab on second sed command.
#

#
# grep pipeline is used to remove directory entries from output.
#
ls -lR $SPOOL | sed 's/^\(..........\)/\1 /
s/	/ /g' | grep -v '^d' | awk '
BEGIN {
	Months="JanFebMarAprMayJunJulAugSepOctNovDec"
	NoDays="0  31 28 31 30 31 30 31 31 30 31 30 31"
	CurMonth = 0
	for( m = 0; m <= 11; ++m ) {
		if( substr(Months, (m*3) + 1, 3) == "'$tmon'") {
			CurMonth = m + 1
			break
		}
	}
	if( CurMonth > 0 ) {
		Julian = 0
		MonJulian = 0
		for( m = 0; m < CurMonth; ++m ) 
			MonJulian += substr(NoDays, (m*3)+1, 3)
		Julian = MonJulian + '$tday'
	}
	#
	# first output record is the Julian date of today
	#
	printf("9999999999999999999 TODAY %d\n", Julian)
}
NF >= 3 {
	
	# hash date into a julian like number

	CurMonth = 0
	for( m = 0; m <= 11; ++m ) {
		if( substr(Months, (m*3) + 1, 3) == $5) {
			CurMonth = m + 1
			break
		}
	}
	if( CurMonth > 0 ) {
		Julian = 0
		MonJulian = 0
		for( m = 0; m < CurMonth; ++m ) 
			MonJulian += substr(NoDays, (m*3)+1, 3)
		Julian = MonJulian + $6
	}
	date[Julian] += $4
}
END {
	for( day in date ) {
		printf("%d %ld\n", day, date[day])
	}
}' > /tmp/chl$$

#
# output to tmp file so we don't create a zillion pipes
#

sort -rn /tmp/chl$$ | awk '
BEGIN {
	sum = 0
	base = 0
}
NR == 1 {
	today = $3
	printf("\n# today is %d\n", $3)
}
NR > 1 {
	sum += $2
	if( sum >= '$NORMALBYTES' ) {
		if( base == 0 ) {
			printf("# passing '$NORMALBYTES' byte threshold, below should get expired\n")
			base = $1
		}
	}
	printf("# got %ld bytes on %d, sofarat %ld", $2, $1, sum)
	if( today == $1 )
		printf(", today")
	else if( (today - 1) == $1 )
		printf(", yesterday")
	printf("\n")
}
END {
	printf("# recommend that you expire over %d days\n", (today - base))
}'

rm -f /tmp/chl$$
-- 
"No matter      Joe Angelo, Sr. Sys. Engineer @ Austec, Inc., San Jose, CA.
where you go,   ARPA: aussjo!joe@lll-tis-b.arpa       PHONE: [408] 279-5533
there you       UUCP: {sdencore,necntc,cbosgd,amdahl,ptsfa,dana}!aussjo!joe
are ..."        UUCP: {styx,imagen,dlb,gould,sci,altnet}!auspyr!joe