karish@mindcraft.com (Chuck Karish) (12/08/90)
In article <1990Dec7.130639.15803@bnr.ca> janick@bnr.ca (Janick Bergeron) writes: >For the last few days, I keep receiving the following message from >Cnews/expire: > >expire problems: >expire: wrong number of fields in ` 660190899~-...' > >(Note: that's a <TAB> between the '`' and the first '6') > >How do I locate the offending article and force its expiry ?? There are two problems to deal with: - fixing the history file so you stop getting mail from expire - expiring the article #1: sed '/^\t660190899~-/d' < history > history.fixed #2: Use 'grep' instead of 'sed', and look at the file name on the end of the line. If the file name isn't there, it's already been expired. Alternate solution: Just leave the file there. Put an entry into crontab to remove articles older than the maximum age in your explist file. New question: What causes the history lines to be mangled in the first place? I'm getting about eight or ten of them a month. Is it related to having my spool disk fill up? -- Chuck Karish karish@mindcraft.com Mindcraft, Inc. (415) 323-9000
mrm@sceard.Sceard.COM (M.R.Murphy) (12/09/90)
In article <660596702.10086@mindcraft.com> karish@mindcraft.com (Chuck Karish) writes: >In article <1990Dec7.130639.15803@bnr.ca> janick@bnr.ca >(Janick Bergeron) writes: >>For the last few days, I keep receiving the following message from >>Cnews/expire: >> >>expire problems: >>expire: wrong number of fields in ` 660190899~-...' >> [...] > >New question: What causes the history lines to be mangled in >the first place? I'm getting about eight or ten of them a >month. Is it related to having my spool disk fill up? >-- > > Chuck Karish karish@mindcraft.com > Mindcraft, Inc. (415) 323-9000 In our case, it was a bad sector on the disk :-) :-( The following code cleans up a history file so that mkdbm is happy with it, and also replaces the single awk line that sifts a history file and prints only lines that are after a given time that I used in a modified expire scheme. The checking for goodness in a history line could be made fancier, but this is enough to make mkdbm happy. Makes for a pretty fast expire, too. Every so often, writing a short specialized tool in C is appropriate, though I'd rather use awk :-) In the case of a bad disk block, awk groused about a record too long and bailed. If 8192 is bad for a buffer here, somebody could get fancy with malloc, or maybe just write the whole thing in one line of perl. ---- cut here ---- /* * exphist - scan history and write only good lines for mkdbm * usage is exphist time_from_getdate * * cursory examination of this code will show that it is snagged from mkdbm. * */ #include <stdio.h> #include <string.h> main(argc, argv) int argc; char *argv[]; { long exptime; long atol(); static char buff[8192]; register char *scan; register char *line; if (argc < 2 || (exptime = atol(argv[1])) == 0) { fprintf(stderr, "Usage: exphist time\n"); exit(2); } for (;;) { line = fgets(buff, sizeof(buff), stdin); if (line == NULL) break; scan = strchr(line, '\t'); if (scan == NULL || line[strlen(line)-1] != '\n') { fprintf(stderr, "bad format: `%.60s'\n", line); continue; } if (atol(++scan) > exptime) fputs(line, stdout); } exit(0); } ---- cut here ---- This is the fragment from expire: ... now=`getdate now` ago=`awk "/^\/expired\// {print ($now-(86400*\$(3)))} {next}" explist` # replace the single-line awk with exphist #awk "{split(\$2,dates,\"~\");if(dates[1]>$ago)print \$0}" history >history.n exphist $ago < history >history.n mkdbm history.n && [ -s history.n ] && mv history history.o && # install new ASCII history file mv history.n history && rm -f history.pag && # and related dbm files rm -f history.dir && mv history.n.pag history.pag && mv history.n.dir history.dir ... Since expire will be executed from cron, the error messages from exphist and mkdbm will show up in mail to somebody important should they occur. Note the "&&" after the mkdbm step. This is added to keep history relatively intact should there be a major failure. None of this stuff takes too kindly to bad blocks, totally running out of file system space, or inodes, or such, but then, what of UNIX(tm) does? -- Mike Murphy mrm@Sceard.COM ucsd!sceard!mrm +1 619 598 5874
karish@mindcraft.com (Chuck Karish) (12/10/90)
In article <1990Dec8.190114.15171@sceard.Sceard.COM> mrm@Sceard.COM (M.R.Murphy) writes: > >The following code cleans up a history file so that mkdbm is happy with it, >and also replaces the single awk line that sifts a history file and prints >only lines that are after a given time that I used in a modified expire scheme. >The checking for goodness in a history line could be made fancier, but this is >enough to make mkdbm happy. Makes for a pretty fast expire, too. Every so often, >writing a short specialized tool in C is appropriate, though I'd rather use >awk :-) This C program is needed only to avoid re-writing the whole history file during checking. On my machine, the mkdbm step takes much longer than the scan anyway and I have enough disk space for a second copy of history, so I use this one-liner in sed: sed -n 's/^<.* /p' # The white space in the pattern is a tab. >... >now=`getdate now` >ago=`awk "/^\/expired\// {print ($now-(86400*\$(3)))} {next}" explist` ># replace the single-line awk with exphist >#awk "{split(\$2,dates,\"~\");if(dates[1]>$ago)print \$0}" history >history.n Doesn't this reproduce the functionality specified by the 'expired' line in the expire control file? -- Chuck Karish karish@mindcraft.com Mindcraft, Inc. (415) 323-9000
mrm@sceard.Sceard.COM (M.R.Murphy) (12/10/90)
In article <660770337.20986@mindcraft.com> karish@mindcraft.com (Chuck Karish) writes: >In article <1990Dec8.190114.15171@sceard.Sceard.COM> mrm@Sceard.COM >(M.R.Murphy) writes: >> >>The following code cleans up a history file so that mkdbm is happy with it, >>and also replaces the single awk line that sifts a history file and prints >>only lines that are after a given time that I used in a modified expire scheme. >>The checking for goodness in a history line could be made fancier, but this is >>enough to make mkdbm happy. Makes for a pretty fast expire, too. Every so often, >>writing a short specialized tool in C is appropriate, though I'd rather use >>awk :-) > >This C program is needed only to avoid re-writing the whole history >file during checking. On my machine, the mkdbm step takes much longer >than the scan anyway and I have enough disk space for a second copy >of history, so I use this one-liner in sed: > >sed -n 's/^<.* /p' # The white space in the pattern is a tab. > >>... >>now=`getdate now` >>ago=`awk "/^\/expired\// {print ($now-(86400*\$(3)))} {next}" explist` >># replace the single-line awk with exphist >>#awk "{split(\$2,dates,\"~\");if(dates[1]>$ago)print \$0}" history >history.n > >Doesn't this reproduce the functionality specified by the 'expired' >line in the expire control file? >-- > > Chuck Karish karish@mindcraft.com > Mindcraft, Inc. (415) 323-9000 The C program referenced in article <1990Dec8.190114.15171@sceard.Sceard.COM> above does not just avoid re-writing the whole history file during checking. It does reproduce the functionality specifed by the 'expired' line in the expire control file, sort of, but the C News expire is not used at all in the simple scheme for "expiration" that I posted a while back. Expiration is maintaining the news database, that is, the articles that are the ebb and flow of USENET as we know it, and the control of reception of duplicate articles from other sites. The scheme is based on: 1) don't accept an article from another site that has already been received, that is, that already exists in the history file, and 2) don't keep old articles lying about wasting space. Another function of the standard C News expire, that is, archiving, I think is better separated. It is more reasonable to set up a sys file entry that sends articles from newsgroups to be archived to an archiver when they are received from the feed. The archiver can then be quite clever and selective about what it bothers to archive. The less that the expiration process has to handle, the better. To accomplish this scheme, I split C News expiration into two separate parts, expire, which maintains the history file and handles 1) above, and trasher which gets rid of old articles and handles 2) above. BTW, the Expires: header is ignored by trasher on the basis that it is only the business of a system's administrators how long an article should take up space. I have since kissed off the script that was trasher and replaced it with reap by dt@yenta.alb.nm.us (thanks, david). Expiration of the history file is just the creation of a new history file that omits lines of the previous history file that are older than some particular time. It need have nothing to do with whether the articles referenced by that line are still around. I used the one-liner awk script awk "{split(\$2,dates,\"~\");if(dates[1]>$ago)print \$0}" history >history.n to do just that. I was happy enough with this part of the scheme until a bad disk block corrupted the history file. Oops. Awk groused because a record was too long for it to handle. Mkdbm groused because the line in the history file was not up to its expectations for a valid line (simple and incomplete though those expectations were). BTW, the corrupted part of the history file had a less-than followed by some characters and a tab, so it would have passed the sed test referenced by Chuck and still would have given mkdbm a problem. Unless sed croaked on the line, too. :-) To get around the problem of lines that mkdbm chokes on, I decided to snag the code from mkdbm and twiddle it about a little so that it would just read history lines on its standard input and write only lines on its standard output that mkdbm would be happy with. As long as I was going to do that much, I might as well have it do the check for old lines, too. That way, exphist, the new C program, reads an old history file, deletes bad lines or lines that are too old, and writes the output so that mkdbm can make the new history files. The awk line above is then replaced by exphist $ago <history >history.n Then mkdbm, move the results around, and save the old stuff. Simple, no? On our news machine, both the scan and the mkdbm are fast :-) Exphist and mkdbm could have been combined, and would probably have been faster, but these tools are more useful when separate. That's part of the UNIX(tm) philosophy. Reap is a separate process for getting rid of old articles. It is completely independent of the process of maintaining history. Reap also has the benefit that it is: 1) short enough so that I can understand it, 2) flexible, 3) fast, 4) and, written by someone else so I didn't have to do it. (thanks again, david) Again, the standard C News expire is not used at all. What I am talking about here is an alternate method of maintaining the news database: articles and history. Yes, it is a Really Good Thing to lock so that no other News processing goes on during the history expiration. It is not necessary to lock News processing during reaping. Everything needs to be locked against itself running at the same time. Don't you just love crons that can't keep things straight? I really like C News. Thanks to its authors. -- Mike Murphy mrm@Sceard.COM ucsd!sceard!mrm +1 619 598 5874
karish@mindcraft.com (Chuck Karish) (12/11/90)
Henry Spencer and David Lawrence have each pointed out to me that my previous remarks about Cnews locking were incomplete. doexpire and newsrun each lock against another instance of themselves when they start up. There's a separate mechanism that locks the whole article filing system, by creating a file called $NEWSCTL/LOCK. This is invoked by the expire and relaynews programs, but only during critical periods in their execution. I submit that it's wise to use this mechanism when hacking on the history file by hand. It's easier than changing the cron entry for expire, as Henry suggested, and it locks relaynews, too. Note that people won't be able to post articles while this lock is set. -- Chuck Karish karish@mindcraft.com Mindcraft, Inc. (415) 323-9000
henry@zoo.toronto.edu (Henry Spencer) (12/11/90)
In article <660852555.23022@mindcraft.com> karish@mindcraft.com (Chuck Karish) writes: >I submit that it's wise to use this mechanism when hacking on the >history file by hand. It's easier than changing the cron entry for >expire, as Henry suggested, and it locks relaynews, too. That's why I suggested using locknews, which does this. (Note, locknews != newslock.) You'd really kind of like to lock out expire too, though, hence the suggestion about the cron entry. -- "The average pointer, statistically, |Henry Spencer at U of Toronto Zoology points somewhere in X." -Hugh Redelmeier| henry@zoo.toronto.edu utzoo!henry