[net.news.b] checkgroups consumes file system

roy@phri.UUCP (Roy Smith) (11/02/85)

> Because we ran out of space on our spool file system today...

	It turns out that the culprit was news itself.  I ran the latest
checkgroups message <1826@gatech.CSNET> yesterday.  A hour or so later, we
ran out of space on /usr.  Doing a hasty expire (i.e. cd ~news/trash-group;
rm *) and rm'ing a bunch of other stuff didn't seem to help as much as I
thought it would, and we ran out of space again soon after that.

	I brought Unix down and found some 3000 (!) checkgroups messages in
/usr/spool/news/control, at about 11kb each.  I had saved the article in
xxx and done "sed 1,/Lines:/d xxx | sh >& xxx.out &", much the same
procedure I usually use.  I just repeated the procedure today with no
trouble, so I'm really confused as to what is going on.  No, I havn't
changed the directory permissions before or since.

	Some excerpts from yesterday's xxx.out (over 2000 lines total, many
of them blank):

inews: Cannot reread article
inews: Cannot reread article
[...]
inews: Cannot install article as /usr/spool/news/control/3140
inews: Link into /usr/spool/news/control/3140 failed, errno 5, check dir permissions.
inews: Cannot reread article
inews: Cannot install article as /usr/spool/news/control/3140
inews: Link into /usr/spool/news/control/3140 failed, errno 5, check dir permissions.
[...]
inews: News system locked up
[...]
inews: Cannot install article as /usr/spool/news/control/3647
inews: Cannot install article as /usr/spool/news/control/3648
inews: Cannot install article as /usr/spool/news/control/3649
inews: Cannot install article as /usr/spool/news/control/3650
[...]
inews: Cannot install article as /usr/spool/news/control/3908
inews: Link into /usr/spool/news/control/3908 failed, errno 5, check dir permissions.
inews: Cannot reread article
inews: Cannot install article as /usr/spool/news/control/3908
inews: Link into /usr/spool/news/control/3908 failed, errno 5, check dir permissions.
[...]

	And, some excerpts from /usr/lib/news/log (I *think* the first line
shown is the first mention of the checkgroup):

Nov  1 18:33	local	posted <543@phri.UUCP> ng control subj 'cmsg checkgroups'
Nov  1 18:33	local	Ctl Msg control from roy: checkgroups
Nov  1 18:33	cmcl2.UUCP	received <1840@gatech.CSNET> ng mod.newslists,net.announce.newusers subj 'Changes to Publicly Accessible Mailing Lists'
Nov  1 18:33	local	system(sed '1,/^$/d' /usr/spool/news/.ar012964 | /usr/lib/news/checkgroups usenet) status 36096
Nov  1 18:33	local	posted <544@phri.UUCP> ng control subj 'cmsg checkgroups'
Nov  1 18:33	local	Ctl Msg control from news: checkgroups
Nov  1 18:33	cmcl2.UUCP	from usenet@gatech.CSNET relay version B 2.10.2 9/18/84; site cmcl2.UUCP
Nov  1 18:33	local	system(sed '1,/^$/d' /usr/spool/news/.ar012970 | /usr/lib/news/checkgroups usenet) status 36096
[...]
Nov  1 19:48	local	Ctl Msg control from news: checkgroups
Nov  1 19:49	cmcl2.UUCP	Duplicate article <320@ssc-vax.UUCP> rejected
Nov  1 19:49	local	system(sed '1,/^$/d' /usr/spool/news/.ar015494 | /usr/lib/news/checkgroups usenet) status 36096
Nov  1 19:49	local	posted <931@phri.UUCP> ng control subj 'cmsg checkgroups'

	As you can see, inews just kept trying to post the checkgroups to
newsgroup control, without any success.  It seems to me that after a
certain number of tries, it should just give up.  As it is, inews just kept
trying until it killed the system -- 30 Mbytes of aborted articles in
~news/control, plus a log file well over 1300 kbytes long!

	I'm not sure why it failed the in the first place.  Could doing a
checkgroups while news is coming in be a Bad Thing?  Actually, several
news-things were going on at the same time.  In addition to the checkgroups
and the incoming feed, we were queueing up news for 2 or 3 downstream
sites, and expire was running (the load on our 11/750 was 8!)
-- 
Roy Smith <allegra!phri!roy>
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016

roy@phri.UUCP (Roy Smith) (11/02/85)

> Because we ran out of space on our spool file system today...

	It turns out that the culprit was news itself.  I ran the latest
checkgroups message <1826@gatech.CSNET> yesterday.  A hour or so later, we
ran out of space on /usr.  Doing a hasty expire (i.e. cd ~news/trash-group;
rm *) and rm'ing a bunch of other stuff didn't seem to help as much as I
thought it would, and we ran out of space again soon after that.

	I brought Unix down and found some 3000 (!) checkgroups messages in
/usr/spool/news/control, at about 11kb each.  I had saved the article in
xxx and done "sed 1,/Lines:/d xxx | sh >& xxx.out &", much the same
procedure I usually use.  I just repeated the procedure today with no
trouble, so I'm really confused as to what is going on.  No, I havn't
changed the directory permissions before or since.

	Some excerpts from yesterday's xxx.out (over 2000 lines total, many
of them blank):

inews: Cannot reread article
inews: Cannot reread article
[...]
inews: Cannot install article as /usr/spool/news/control/3140
inews: Link into /usr/spool/news/control/3140 failed, errno 5, check dir permissions.
inews: Cannot reread article
inews: Cannot install article as /usr/spool/news/control/3140
inews: Link into /usr/spool/news/control/3140 failed, errno 5, check dir permissions.
[...]
inews: News system locked up
[...]
inews: Cannot install article as /usr/spool/news/control/3647
inews: Cannot install article as /usr/spool/news/control/3648
inews: Cannot install article as /usr/spool/news/control/3649
inews: Cannot install article as /usr/spool/news/control/3650
[...]
inews: Cannot install article as /usr/spool/news/control/3908
inews: Link into /usr/spool/news/control/3908 failed, errno 5, check dir permissions.
inews: Cannot reread article
inews: Cannot install article as /usr/spool/news/control/3908
inews: Link into /usr/spool/news/control/3908 failed, errno 5, check dir permissions.
[...]

	And, some excerpts from /usr/lib/news/log (I *think* the first line
shown is the first mention of the checkgroup):

Nov  1 18:33	local	posted <543@phri.UUCP> ng control subj 'cmsg checkgroups'
Nov  1 18:33	local	Ctl Msg control from roy: checkgroups
Nov  1 18:33	cmcl2.UUCP	received <1840@gatech.CSNET> ng mod.newslists,net.announce.newusers subj 'Changes to Publicly Accessible Mailing Lists'
Nov  1 18:33	local	system(sed '1,/^$/d' /usr/spool/news/.ar012964 | /usr/lib/news/checkgroups usenet) status 36096
Nov  1 18:33	local	posted <544@phri.UUCP> ng control subj 'cmsg checkgroups'
Nov  1 18:33	local	Ctl Msg control from news: checkgroups
Nov  1 18:33	cmcl2.UUCP	from usenet@gatech.CSNET relay version B 2.10.2 9/18/84; site cmcl2.UUCP
Nov  1 18:33	local	system(sed '1,/^$/d' /usr/spool/news/.ar012970 | /usr/lib/news/checkgroups usenet) status 36096
[...]
Nov  1 19:48	local	Ctl Msg control from news: checkgroups
Nov  1 19:49	cmcl2.UUCP	Duplicate article <320@ssc-vax.UUCP> rejected
Nov  1 19:49	local	system(sed '1,/^$/d' /usr/spool/news/.ar015494 | /usr/lib/news/checkgroups usenet) status 36096
Nov  1 19:49	local	posted <931@phri.UUCP> ng control subj 'cmsg checkgroups'

	As you can see, inews just kept trying to post the checkgroups to
newsgroup control, without any success.  It seems to me that after a
certain number of tries, it should just give up.  As it is, inews just kept
trying until it killed the system -- 30 Mbytes of aborted articles in
~news/control, plus a log file well over 1300 kbytes long!

	I'm not sure why it failed the in the first place.  Could doing a
checkgroups while news is coming in be a Bad Thing?  Actually, several
news-things were going on at the same time.  In addition to the checkgroups
and the incoming feed, we were queueing up news for 2 or 3 downstream
sites, and expire was running (the load on our 11/750 was 8!)

	It also seems that our /usr/lib/news/seq file got zonked.  Several
times during this episode, the article sequence numbers seem to have been
reset to 1.  I've artificially reset our sequencer to 2000, hoping that
will get us past any sequence numbers generated during this mess.  From
/usr/lib/news/errlog:

[...]
Nov  1 20:49	<489@dadla.UUCP>	inews: Article <489@dadla.uucp> locked up
Nov  1 21:51	<1533@phri.UUCP>	inews: Cannot reread article
Nov  1 21:51	<1534@phri.UUCP>	inews: Cannot reread article
Nov  1 21:51	<1535@phri.UUCP>	inews: Cannot reread article
Nov  1 21:51	<1@phri.UUCP>	inews: Cannot reread article
Nov  1 21:52	<2@phri.UUCP>	inews: Cannot reread article
Nov  1 21:53	<4@phri.UUCP>	inews: Cannot reread article
Nov  1 21:53	<170@sagan.UUCP>	inews: Cannot reread article
Nov  1 21:55	<2@phri.UUCP>	inews: Cannot install article as /usr/spool/news/control/3140
Nov  1 21:55	<2@phri.UUCP>	inews: Link into /usr/spool/news/control/3140 failed, errno 5, check dir permissions.
Nov  1 21:55	<2@phri.UUCP>	inews: Cannot reread article
Nov  1 21:55	<3@phri.UUCP>	inews: Cannot install article as /usr/spool/news/control/3140
[...]
Nov  1 22:03	<2@phri.UUCP>	inews: Cannot reread article
Nov  1 22:03	<3@phri.UUCP>	inews: Cannot reread article
Nov  1 22:03	<4@phri.UUCP>	inews: Cannot reread article
[...]
Nov  2 00:51	<1@phri.UUCP>	inews: Cannot reread article
Nov  2 00:51	<2@phri.UUCP>	inews: Cannot reread article
Nov  2 00:52	<1@phri.UUCP>	inews: Cannot reread article
Nov  2 00:52	<2@phri.UUCP>	inews: Cannot reread article

	And here there looks like a write glitch (probably becuase the file
system was full by then):

Nov  2 00:52	<3@phri.UUCP>	inews: Cannot rereadNov  2 01:10	<29@phri.UUCP>	inews: Cannot reread article
Nov  2 01:11	<39@phri.UUCP>	inews: Cannot reread article
Nov  2 01:11	<40@phri.UUCP>	inews: Cannot reread article

-- 
Roy Smith <allegra!phri!roy>
System Administrator, Public Health Research Institute
455 First Avenue, New York, NY 10016