[alt.sources] Question on non-dbm history files

emv@math.lsa.umich.edu (Edward Vielmetti) (03/01/90)

In article <1990Feb28.172640.25896@utzoo.uucp> henry@utzoo.uucp (Henry Spencer) writes:

   In article <253@uucs1.UUCP> gaf@uucs1.UUCP () writes:
   >Regarding the 2.11 software...
   >If I understand rightly, these ever-growing files in my admin/history.d
   >directory are used to detect duplicate articles.  Okay, but is that all
   >they're for?

   The one other purpose that dbm (and the assorted substitutes for it)
   is intended to fulfill is random article lookup by message-ID.  Most of
   the news readers will try to do this in some circumstances, but the
   circumstances in question are usually use of some obscure command that
   nobody ever invokes in practice.

Here's an extremely rough cut at "article", a program to fetch
usenet articles by Message-ID over NNTP, intended to be somewhat
less obscure than most news readers. Invoke it like so:
	article "<253@uucs1.UUCP>"
once you've configured it appropriately.

I would like to teach it to cope with history file formats & be
generally more nice, but for me it works just dandy for now.

Followups to comp.lang.perl where this will probably be hacked
on to pieces.

--Ed

Edward Vielmetti, U of Michigan math dept.

#!/usr/local/bin/perl

# article -- fetch usenet articles by Message-ID.
# usage: article "<253@uucs1.UUCP>"

# what we have here is perl to fetch articles by id.
# it should do the following:
#	cope with B and C formats in dbm or flat file format
#	cope with NNTP to the default server
#	cope with NNTP to any server specified, or a list.
#
# something like this will work for groups with an
# 	Original-message-id:
# tag:
#
# cat pointer | article `sed -n -e 's/^Original-message-id: //p'`

# suggested by eci386!clewis
# socket code mailed to me by cmf@obie.cis.pitt.edu (Carl M. Fongheiser)

# history: dbmopen(HIST,'/usr/spool/news/lib/news/history',0666);
# what you get back are offsets to the history file ?  dunno.

# Configuration information -- change to reflect your site.

$nntpserver='stag.math.lsa.umich.edu';	# look at ENV, also list;
					# should also read dbm or grep
					# history file.
$newslib='/usr/spool/news/lib/news/';	# news library
$newspool='/usr/spool/news/';		# news spool
$defport='nntp';			# default nntp port
$debug = 1;				# uh, if it don't work

$prog = $0;
$art = $ARGV[0];

if ($nntpserver) {

   do 'sys/socket.h' || die "$prog: Can't do sys/socket.h: $@";

   $sockaddr = 'S n a4 x8';

   chop($hostname = `hostname`);

   ($name,$aliases,$proto) = getprotobyname('tcp');
   ($name,$aliases,$port) = getservbyname($defport, 'tcp')
	unless $port =~ /^\d+$/;
   ($name,$aliases,$type,$len,$nntpaddr) = gethostbyname($nntpserver);
   ($name,$aliases,$type,$len,$localaddr) = gethostbyname($hostname);

   $nntp_saddr = pack($sockaddr, &AF_INET, $port, $nntpaddr);
   $local_saddr = pack($sockaddr, &AF_INET, 0, $localaddr);

   socket(S, &PF_INET, &SOCK_STREAM, $proto) || die "socket: $!";
   bind(S, $local_saddr) || die "$prog: bind: $!";
   connect(S, $nntp_saddr) || die "$prog: connect: $!";

   select(S); $| = 1; select(stdout);	# flush after lines

   if ($_ = <S>, /^2/)  {
	$debug && print(STDERR "$prog: connected to nntp server: $_");
   } else {
	die "$prog: failure: $_\n";
   }

   print S "article $art\n";
   $debug && print (STDERR "$prog: send command article $art\n");

   if ($_ = <S>, /^2/)  {
	$debug && print(STDERR "$prog: article cmd OK: $_");
   } else {
	die "$prog: failure: $_\n";
   }

   while ( ($_ = <S>) && !(/^\.\r\n/) ) {
	s/^\.//;			# take care of hidden dot ....
	s/\r$//;			# take care of trailing cr
	print (STDOUT $_);
   }
   close(STDOUT);

   print (S "quit\n");
   close(S);

}

lwall@jato.Jpl.Nasa.Gov (Larry Wall) (03/01/90)

In article <EMV.90Feb28134417@duby.math.lsa.umich.edu> emv@math.lsa.umich.edu (Edward Vielmetti) writes:
:    The one other purpose that dbm (and the assorted substitutes for it)
:    is intended to fulfill is random article lookup by message-ID.  Most of
:    the news readers will try to do this in some circumstances, but the
:    circumstances in question are usually use of some obscure command that
:    nobody ever invokes in practice.
: 
: Here's an extremely rough cut at "article", a program to fetch
: usenet articles by Message-ID over NNTP, intended to be somewhat
: less obscure than most news readers. Invoke it like so:
: 	article "<253@uucs1.UUCP>"
: once you've configured it appropriately.
: 
: I would like to teach it to cope with history file formats & be
: generally more nice, but for me it works just dandy for now.

Here's a vaguely related script that does dbm history file lookups and nntp
to refetch articles from an nntp server that were dropped in the bitbucket
for some reason (usually by running out of disk space, or some such). 
It should probably extract the default list of newsgroups from the sys file,
but I was lazy.

Larry Wall
lwall@jpl-devvax.jpl.nasa.gov

#!/bin/sh
: make a subdirectory, cd to it, and run this through sh.
echo 'If this kit is complete, "End of kit" will echo at the end'
echo Extracting refetch
sed >refetch <<'!STUFFY!FUNK!' -e 's/X//'
X#!/usr/bin/perl
X
X$restart = shift(@ARGV);
X
X$server = 'jato';
X$newsgroups =
X'ca.*,comp.*,gnu.*,jpl.*,la.*,misc.*,nasa.*,news.*,rec.*,sci.*,soc.*,talk.*';
X
Xprint "Server? [$server] ";
Xchop($ans = <STDIN>);
X$server = $ans if $ans;
X
X$pat = 'S n C4 x8';
X
X$af_unix = 1;
X$af_inet = 2;
X
X$stream = 1;
X$datagram = 2;
X
X($name,$aliases,$proto) = getprotobyname('tcp');
X$tcp = $proto;
X
X($name,$aliase,$port,$proto) = getservbyname('nntp','tcp');
X$nntp = $port;
X
Xif ($server =~ /^\d+\./) {
X    @bytes = split(/\./,$server);
X}
Xelse {
X    ($name,$aliases,$addrtype,$length,@addrs) = gethostbyname($server);
X    die "Can't lookup $server\n" unless $name;
X    @bytes = unpack("C4",$addrs[0]);
X}
X
X$this = pack($pat,$af_inet,1492,      0,0,0,0);
X$that = pack($pat,$af_inet,$nntp,@bytes);
X
Xsocket(NNTP,$af_inet,$stream,$tcp) || die "socket: $!\n";
Xbind(NNTP,$this) || die "bind: $!\n";
Xconnect(NNTP,$that) || die "connect: $!\n";
X
Xselect(NNTP); $| = 1; select(STDOUT); $| = 1;
X
Xprint "\nConnected to NNTP server at $server (",join('.',@bytes),").\n\n";
X
Xif (!$restart) {
X    print "Newsgroups? [$newsgroups] ";
X    chop($ans = <STDIN>);
X    $newsgroups = $ans if $ans;
X
X    $oneday = 60 * 60 * 24;
X    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
X	localtime(time-$oneday);
X    $yesterday = sprintf("%02d%02d%02d",$year,$mon+1,$mday);
X
X    while (length($date) != 6) {
X	print "\nSince date? [$yesterday] ";
X	chop($date = <STDIN>);
X	if ($date < 0) {
X	    ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) =
X	       localtime(time - $oneday * $date);
X	    $date = sprintf("%02d%02d%02d",$year,$mon+1,$mday);
X	}
X	else {
X	    $date = $yesterday unless $date;
X	}
X    }
X
X    $now = sprintf("%02d%02d%02d",$hour,$min,$sec);
X
X    while (length($time) != 6) {
X	print "\nSince time? [$now] ";
X	chop($time = <STDIN>);
X	$time = $now unless $time;
X    }
X}
X
Xfork && exit;
X
Xopen(STDOUT,">refetch.log");
Xopen(STDERR,">&STDOUT");
X
Xselect(STDERR); $| = 1;
Xselect(STDOUT); $| = 1;
X
Xgoto label if $restart;
X
Xdbmopen(dhist,"history",0666) || die "Can't open history dbm file: $!\n";
X
Xprint STDERR "Loading history...";
Xopen(hist,'/usr/lib/news/history') || die "Can't open history file";
X($st_dev,$st_ino,$st_mode,$st_nlink,$st_uid,$st_gid,$st_rdev,$st_size,
X    $st_atime,$st_mtime,$st_ctime,$st_blksize,$st_blocks) = stat(hist);
Xfor ($offset = $st_size - 100000; $offset > 0; $offset -= 100000) {
X    if (seek(hist,$offset,0)) {
X	$_ = <hist>;		# probably starts in middle of a line
X	$_ = <hist>;
X	m|	(\d+)/(\d+)/(\d+)| || next;
X	last if $3 * 10000 + $1 * 100 + $2 < $date;
X    }
X    else {
X	$offset = -1;
X    }
X}
Xseek(hist,0,0) if $offset < 0;
Xwhile (<hist>) {
X    m|	(\d+)/(\d+)/(\d+)| || next;
X    last if $3 * 10000 + $1 * 100 + $2 >= $date;
X}
X$pct = int(tell(hist) * 100 / $st_size);
Xprint STDERR "starting at $pct%...";
X$pos = tell(hist);
Xwhile (<hist>) {
X    /^(<[^>]*>)/ && ($history{$1} = $pos + 0);
X    $pos = tell(hist);
X}
Xprint STDERR "done\n";
X
Xprint NNTP "newnews $newsgroups $date $time\n";
X
Xopen(TMP,">/tmp/refetch$$") || die "Can't open tmp file";
X
Xwhile (<NNTP>) {
X    last if /^230/;
X}
X
Xchdir "/usr/spool/news" || die "Can't cd to /usr/spool/news: $!\n";
X
Xwhile (<NNTP>) {
X    chop;
X    chop;
X    $Messid = $_;
X    y/A-Z/a-z/;
X    last if $_ eq '.';
X    if ($history{$Messid}) {
X	$loc = $dhist{$_ . "\000"};
X	$loc = $dhist{$Messid . "\000"} if $loc eq '';
X	if ($loc eq '') {
X	    $loc = $history{$Messid};
X	    print STDERR "???d\t",$Messid,"\n";
X	}
X	else {
X	    ($loc) = unpack("l",$loc);
X	    if ($loc != $history{$Messid}) {
X		print STDERR "???\t$loc != $history{$Messid}\n";
X		$loc = $history{$Messid};
X	    }
X	}
X	seek(hist,$loc,0);
X	$histline = <hist>;
X	($messid,$date,$artlist) = split(/\t/,$histline);
X	if ($messid =~ /^</) {
X	    if ($messid ne $Messid) {
X		delete $dhist{$_ . "\000"};
X		print STDERR ">>>m$messid\t",$Messid,"\n";
X		print TMP $Messid,"\n";
X		next;
X	    }
X	    @artlist = split(' ',$artlist);
X	    $exists = 0;
X	    for (@artlist) {
X		y|.|/|;
X		if (-e $_) {
X		    if (-z _) {
X			--$exists;
X			unlink $_;
X			print STDERR "\t\t$Messid $_ zero size\n";
X		    }
X		    else {
X			print STDERR "\t\t$Messid $_ exists\n";
X			++$exists;
X			last;
X		    }
X		}
X		else {
X		    print STDERR "\t\t$Messid $_ doesn't exist\n";
X		}
X	    }
X	    if ($exists < 1) {
X		delete $dhist{$_ . "\000"};
X		if ($exists < 0) {
X		    print STDERR ">>>z\t",$Messid,"\n";
X		}
X		else {
X		    print STDERR ">>>e\t",$Messid,"\n";
X		}
X		print TMP $Messid,"\n";
X		next;
X	    }
X	}
X	else {
X	    delete $dhist{$_ . "\000"};
X	    print STDERR ">>>s\t$Messid\t",$_,"\n";
X	    print TMP $Messid,"\n";
X	    next;
X	}
X	print STDERR "\t",$Messid,"\n";
X    }
X    else {
X	delete $dhist{$_ . "\000"};
X	print STDERR ">>>h\t",$Messid,"\n";
X	print TMP $Messid,"\n";
X    }
X}
Xclose TMP;
Xdbmclose(dhist);
X
Xlabel:
Xif ($restart) {
X    open(TMP,"/tmp/refetch$restart") || die "Can't reopen /tmp/refetch$restart: $!";
X}
Xelse {
X    open(TMP,"/tmp/refetch$$") || die "Can't reopen /tmp/refetch$$: $!";
X}
X
Xwhile (<TMP>) {
X    chop;
X    $article = $_;
X    print NNTP "article $_\n";
X    ($_ = <NNTP>) =~ /^220/ || (warn("Not 220 on $article: $_"),next);
X    open(RNEWS,"|/usr/local/bin/rnews");
X    while (<NNTP>) {
X	s/\r\n$/\n/;
X	last if $_ eq ".\n";
X	s/^\.\././;
X	print RNEWS;
X    }
X    close RNEWS;
X    if ($?) {
X	printf STDERR "Exit %d sig %d from rnews on %s\n",
X	    $? >> 8; $? & 255, $article;
X    }
X    else {
X	print STDERR "OK	$article\n";
X    }
X}
X
Xprint NNTP "quit\n";
Xwhile (<NNTP>) {
X    ;
X}
X
X# unlink "/tmp/refetch$$";
X
Xprint STDERR "done\n";
!STUFFY!FUNK!
echo ""
echo "End of kit"
: I do not append .signature, but someone might mail this.
exit