[comp.lang.perl] This looks like a job for Perl Man

pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) (09/27/90)

Have you ever switched your NNTP client over to a new server?

You quickly discover that your .newsrc file is useless because the article
numbers on the new server are different than the article #'s on your
old server.  

What's needed is a tool to map old article numbers to new article numbers.
Something like:
		remap_news oldserver newserver

remap_news would read a user's .newsrc file, connect to oldserver and convert
read articles to message #'s, connect to newserver and convert those
message #'s to article numbers and then construct a new .newsrc file.

Looks like a bear, but probably manageable in perl.

Uh, I don't suppose anyone's already done this?  Heh, heh.

Paul O'Neill                 pvo@oce.orst.edu		DoD 000006
Coastal Imaging Lab
OSU--Oceanography
Corvallis, OR  97331         503-737-3251

brnstnd@kramden.acf.nyu.edu (Dan Bernstein) (09/27/90)

In article <20550@orstcs.CS.ORST.EDU> pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) writes:
> remap_news would read a user's .newsrc file, connect to oldserver and convert
> read articles to message #'s, connect to newserver and convert those
> message #'s to article numbers and then construct a new .newsrc file.

Because certain people don't believe in accurate min article fields,
it's safer to convert the unread articles.

---Dan

lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) (09/27/90)

In article <18409:Sep2703:08:1290@kramden.acf.nyu.edu> brnstnd@kramden.acf.nyu.edu (Dan Bernstein) writes:
: In article <20550@orstcs.CS.ORST.EDU> pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) writes:
: > remap_news would read a user's .newsrc file, connect to oldserver and convert
: > read articles to message #'s, connect to newserver and convert those
: > message #'s to article numbers and then construct a new .newsrc file.
: 
: Because certain people don't believe in accurate min article fields,
: it's safer to convert the unread articles.

Might be faster too, depending on how many unread newsgroups you keep in
your .newsrc.

The only problem I see with doing the unreads is that you would end up
marking as read any articles that happened to be on the new server but
not on the old, unless you constructed the universal set on the old server
(which is certainly doable, but a drag).

Larry

urlichs@smurf.sub.org (Matthias Urlichs) (09/27/90)

In comp.lang.perl, article <9695@jpl-devvax.JPL.NASA.GOV>,
  lwall@jpl-devvax.JPL.NASA.GOV (Larry Wall) writes:
< : 
< : Because certain people don't believe in accurate min article fields,
< : it's safer to convert the unread articles.
< 
< Might be faster too, depending on how many unread newsgroups you keep in
< your .newsrc.
< 
< The only problem I see with doing the unreads is that you would end up
< marking as read any articles that happened to be on the new server but
< not on the old, unless you constructed the universal set on the old server
< (which is certainly doable, but a drag).
< 
Another problem is that nobody guarantees that the articles on both servers
are in the same order. So I'd do something like this, assuming that the NNTP
servers understand XHDR (most do):

for each group
   read numbers of known articles
   old:GROUP <group>
   old:XHDR Message-ID 1-9999999
     remember the ID of every read article
   forget list of numbers -- not applicable to new server
   new:GROUP <group>
   new:XHDR Message-ID 1-9999999
     remember the number of every known ID
     remember which articles are not present
   construct read-messages list by concatenating the ranges of
     remembered numbers; don't forget to omit the range
     from last_article+1 to 9999999 :-)

That would probably do it. Anyone want to code it in Perl?
Concatenating the number ranges into a reasonable newsrc line doesn't look
particularly easy, although Randal will surely invent a good one-liner for
it. ;-)
-- 
Matthias Urlichs -- urlichs@smurf.sub.org -- urlichs@smurf.ira.uka.de     /(o\
Humboldtstrasse 7 - 7500 Karlsruhe 1 - FRG -- +49+721+621127(0700-2330)   \o)/

kaul@icarus.eng.ohio-state.edu (Rich Kaul) (09/27/90)

In article <20550@orstcs.CS.ORST.EDU> pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) writes:
   Have you ever switched your NNTP client over to a new server?

   You quickly discover that your .newsrc file is useless because the article
   numbers on the new server are different than the article #'s on your
   old server.  
   [ ... ]
   Looks like a bear, but probably manageable in perl.

True, perl could probably handle it and no doubt it would do a good
job.  A better solution would be to pick a more intelligent news
reader.  GNUS, for example, will let you read from many servers and
all it requires is GNU emacs ;-).
-- 
Rich Kaul                         | It wouldn't be research if we
kaul@icarus.eng.ohio-state.edu    | knew what we were doing.

ndjc@hobbit.UUCP (pri=-10 Nick Crossley) (09/28/90)

In article <20550@orstcs.CS.ORST.EDU> pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) writes:
>You quickly discover that your .newsrc file is useless because the article
>numbers on the new server are different than the article #'s on your
>Looks like a bear, but probably manageable in perl.
>...
>Uh, I don't suppose anyone's already done this?  Heh, heh.

This is not really what you asked for, but it might be useful to some.
Quite some time ago I was having problems on a system where my .newsrc
file was lost, or the articles were renumbered, etc.  I wrote the
following shell script (this was before perl!) which marks as read
all articles older than the specified number of days.  I have not
rewritten this in perl, since I have never needed to run it recently!

--------------------- cut here ---------------------------
:	Mark as read all news articles older than $1 days

NEWS=/usr/spool/news
V=:

case $1 in
	-v)	V=echo ; shift ;;
esac

case $1 in
	"")	set 2 ;;
	[0-9]*)	: OK ;;
	*)	echo "Usage:  $0 [-v] [days]" 1>&2 ; exit 1 ;;
esac


cp .newsrc savednewsrc
{
	for	group in `sed -n 's/:.*//p' <.newsrc`
	do
		echo "/^$group:/c"
		echo "$group: \c"
		dir=$NEWS/`echo $group | tr '.' '/'`
		if	[ ! -d $dir ]
		then	echo
		else
			( cd $dir ; find . -type f -mtime +$1 -print ) |
			sed -e 's!\./!!' -e '/\//d' |
			sort +0n -1 |
			awk '
			NR==1	{  srange=1; erange=$1; next  }
				{
					if	($1 == erange+1)
					{
						# continue range
						erange++
						next
					}
					else if	(srange == erange)
						printf "%d,",srange
					else	printf "%d-%d,",srange,erange
					srange=$1
					erange=$1
				}
			END	{
					if	(erange == 0)
						printf "\n"
					else if	(srange == erange)
						printf "%d\n",srange
					else	printf "%d-%d\n",srange,erange
				}
			'
		fi
		echo "."
		$V "$group done" >&2
	done
	echo "w"
	echo "q"
} |
ed - .newsrc
---------------------- End of script ---------------------------
-- 

<<< standard disclaimers >>>
Nick Crossley, ICL NA, 9801 Muirlands, Irvine, CA 92718-2521, USA 714-458-7282
uunet!ccicpg!ndjc  /  ndjc@ccicpg.UUCP

Andrew.Vignaux@comp.vuw.ac.nz (Andrew Vignaux) (09/28/90)

> In article <20550@orstcs.CS.ORST.EDU> pvo@sapphire.OCE.ORST.EDU (Paul O'Neill) writes:
> remap_news would read a user's .newsrc file, connect to oldserver and convert
> read articles to message #'s, connect to newserver and convert those
> message #'s to article numbers and then construct a new .newsrc file.

> Uh, I don't suppose anyone's already done this?  Heh, heh.

If you are planning to write something I suggest you don't start with
this quick hack which seems to work but I've forgotten how ;-)  I only
needed it once and it does work although it makes a few mistakes.
When I looked at it just now, I discovered a "next line" with no
"line" label -- as I said, it hasn't been well tested.

Looking back on it I can see how my perl coding has changed (e.g. this
was before I discovered the joy of defined() :-)

read_heuristic seems to try to fix a bug, but I'm not sure what it
was.  I think our new server had a larger expiry time, so it was
marking 2 week old articles which had expired from the old server as
unread on the new server.  The scripts were obviously hacked to fix
this (see %EXISTS, min_num, etc.).  "program archaeology" -- the new
hacker's game.

As Dan mentioned, your min article field better be nearly accurate or
it will construct some large lists.

In my case the read list will almost certainly be smaller than the
unread list :-(

Andrew
-- 
Domain address: Andrew.Vignaux@comp.vuw.ac.nz

#! /bin/sh
# This is a shell archive.  Remove anything before this line, then unpack
# it by saving it into a file and typing "sh file".  To overwrite existing
# files, type "sh file -c".  You can also feed this as standard input via
# unshar, or by typing "sh <file", e.g..  If this archive is complete, you
# will see the following message at the end:
#		"End of shell archive."
# Contents:  rc2msgs msgs2rc
# Wrapped by ajv@downstage.comp.vuw.ac.nz on Fri Sep 28 23:26:03 1990
PATH=/bin:/usr/bin:/usr/ucb ; export PATH
if test -f 'rc2msgs' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'rc2msgs'\"
else
echo shar: Extracting \"'rc2msgs'\" \(2476 characters\)
sed "s/^X//" >'rc2msgs' <<'END_OF_FILE'
X#!/usr/bin/perl
X
Xif ($#ARGV < $[) {
X	print STDERR "usage: $0 server\n"; exit (1);
X}
X$server = shift (@ARGV);
X
X$response = do server_init($server);
Xif ($response < 0) {
X	die "Can't get active file from server $server.";
X}
X
Xwhile (<>) {
X	unless (/^(\S+)([:!])\s*(.*)\s*/) {
X		warn "bad newsrc line\t$_\n";
X		next;
X	}
X	($group,$sub,$rest) = ($1,$2,$3);
X
X	do put_server ("GROUP $group");
X	$line = do get_server ();
X	unless ($line =~ /^211/) {
X		warn "can't find $group";
X		next line;
X	}
X		
X	print "$sub\t$group\n";
X
X	foreach $clump (split(/,/, $rest)) {
X		do put_server ("XHDR Message-ID $clump");
X		$line = do get_server();
X
X		unless ($line =~ /^221/) {
X			warn "Can't scan $group $clump\n";
X			next;
X		}
X
X		$last_num = -99;
X		while ($_ = do get_server())
X		{
X			last if (/^\./);
X			($num, $id) = split;
X			next unless ($num > $last_num);
X			print "\t$id\n";
X			$last_num = $num;
X		}
X	}
X}
Xdo close_server();
X
Xsub server_init
X{
X	local($host,$sockaddr,$pinet,$inet,$stream,$name,$aliases,
X	      $proto,$port,$type,$len,$addr,$ok);
X	$host = $_[0];
X	$sockaddr = 'S n a4 x8';
X	$pinet = $inet = 2;
X	$stream = 1;
X
X	($name, $aliases, $proto) = getprotobyname('tcp');
X	($name, $aliases, $port) = getservbyname('nntp', 'tcp');
X	if ((($name, $aliases, $type, $len, $addr) = 
X			gethostbyname($host)) == 0) {
X		$addr = do inet_addr($host);
X	}
X
X	$serv = pack($sockaddr, $inet, $port, $addr);
X	socket(S, $pinet, $stream, $proto) || die "socket: $!";
X	connect(S, $serv) || die "connect: $!";
X
X	$_ = do get_server();
X	$ok = -1;
X	check: {
X		if (/^20[01]/) { $ok = 0; last check; }
X		if (/^502/) {
X			print "This machine does not have permission to use the $host news server.\n";
X			last check;
X		}
X		print "Unexpected response code from $host news server\n";
X	}
X	$ok;
X}
X
Xsub get_server
X{
X	$_ = <S>; chop; chop; $_;
X}
X
Xsub put_server
X{
X	send(S, "$_[0]\r\n", 0);
X}
X
Xsub close_server
X{
X	do put_server('QUIT');
X	do get_server();
X}
X
Xsub inet_addr
X{
X	@parts = split(/\./, $_[0]);
X	bit: {
X		if ($#parts == 0) { 
X			$val = $parts[0]; last bit; }
X		if ($#parts == 1) { 
X			$val = ($parts[0] << 24) | ($parts[1] & 0xffffff);
X			 last bit; }
X		if ($#parts == 2) { 
X               		$val = ($parts[0] << 24) | (($parts[1] & 0xff) << 16) |
X                        	($parts[2] & 0xffff);
X			last bit; }
X		if ($#parts == 3) { 
X                	$val = ($parts[0] << 24) | (($parts[1] & 0xff) << 16) |
X                      		(($parts[2] & 0xff) << 8) | ($parts[3] & 0xff);
X			last bit; }
X	}
X	pack("N", $val);
X}
END_OF_FILE
if test 2476 -ne `wc -c <'rc2msgs'`; then
    echo shar: \"'rc2msgs'\" unpacked with wrong size!
fi
chmod +x 'rc2msgs'
# end of 'rc2msgs'
fi
if test -f 'msgs2rc' -a "${1}" != "-c" ; then 
  echo shar: Will not clobber existing file \"'msgs2rc'\"
else
echo shar: Extracting \"'msgs2rc'\" \(3654 characters\)
sed "s/^X//" >'msgs2rc' <<'END_OF_FILE'
X#!/usr/bin/perl
X
Xif ($#ARGV < $[) {
X	print STDERR "usage: $0 server\n"; exit (1);
X}
X$server = shift (@ARGV);
X
X$response = do server_init($server);
Xif ($response < 0) {
X	die "Can't get active file from server $server.";
X}
X
X$| = 1;
X$group = "";
X$sub = "";
X@NUMS = ();
Xwhile (<>) {
X	chop; ($command,$id) = split ('\t');
X	if ($command ne "") {
X		do read_heuristic ();
X		do emit_newsrc ($group, $sub, @NUMS) if ($group ne "");
X		$group = $id;
X		$sub = $command;
X
X		do put_server ("GROUP $group");
X		$line = do get_server ();
X		unless ($line =~ /^211\s*\d*\s*(\d*)\s*(\d*)/) {
X			warn "can't find $group";
X			next;
X		}
X		($bot,$top) = ($1,$2);
X
X		do put_server ("XHDR Message-ID $bot-$top");
X		$line = do get_server();
X
X		unless ($line =~ /^221/) {
X			warn "Can't scan $group\n";
X			next;
X		}
X
X		undef %ID;
X		undef %EXISTS;
X		$last_num = -99;
X		while ($_ = do get_server())
X		{
X			last if (/^\./);
X			($num, $id) = split;
X			next unless ($num > $last_num);
X			$ID{$id} = $num;
X			$EXISTS{$num} = $id;
X			$last_num = $num;
X		}
X		@NUMS = ();
X		$min_num = 1e20;
X		next;
X	}
X	$num = $ID{$id};
X	if ($num eq "") {
X		warn "Can't find $group:$id\n";
X		next;
X	}
X	$min_num = $num if ($#NUMS < $[ || $num < $min_num);
X	push (@NUMS, $num);
X}
Xdo read_heuristic ();
Xdo emit_newsrc ($group, $sub, @NUMS);
X
Xdo close_server();
X
Xsub read_heuristic {
X	if ($#NUMS >= $[) {
X		if ($top > 0) {
X			foreach $num ($bot .. $top) {
X				push (@NUMS, $num) if ($EXISTS{$num} eq "");
X			}
X			foreach $num ($bot .. ($min_num-1)) {
X				push (@NUMS, $num);
X			}
X		}
X	}
X}
X
Xsub bynum { $a - $b; }
X
Xsub emit_newsrc {
X	local ($group, $sub, @nums) = @_;
X
X	print "$group$sub";
X
X	@sorted = sort bynum @nums;
X	if ($#sorted < $[) {
X		$first = $last = -99;
X	}
X	else {
X		print " 1";
X		$first = 1;
X		$last = $bot-1;
X	}
X
X	foreach $num (@sorted) {
X		next if ($num == $last);
X		if ($num == $last+1) {
X			;
X		}
X		else {
X			if ($last != -99) {
X				print "-$last" if ($first != $last);
X				print ",";
X			}
X			print "$num";
X			$first = $num;
X		}
X		$last = $num;
X	}
X	if ($last != -99) {
X		print "-$last" if ($first != $last);
X	}
X	print "\n";
X}
X
Xsub server_init
X{
X	local($host,$sockaddr,$pinet,$inet,$stream,$name,$aliases,
X	      $proto,$port,$type,$len,$addr,$ok);
X	$host = $_[0];
X	$sockaddr = 'S n a4 x8';
X	$pinet = $inet = 2;
X	$stream = 1;
X
X	($name, $aliases, $proto) = getprotobyname('tcp');
X	($name, $aliases, $port) = getservbyname('nntp', 'tcp');
X	if ((($name, $aliases, $type, $len, $addr) = 
X			gethostbyname($host)) == 0) {
X		$addr = do inet_addr($host);
X	}
X
X	$serv = pack($sockaddr, $inet, $port, $addr);
X	socket(S, $pinet, $stream, $proto) || die "socket: $!";
X	connect(S, $serv) || die "connect: $!";
X
X	$_ = do get_server();
X	$ok = -1;
X	check: {
X		if (/^20[01]/) { $ok = 0; last check; }
X		if (/^502/) {
X			print "This machine does not have permission to use the $host news server.\n";
X			last check;
X		}
X		print "Unexpected response code from $host news server\n";
X	}
X	$ok;
X}
X
Xsub get_server
X{
X	$_ = <S>; chop; chop; $_;
X}
X
Xsub put_server
X{
X	send(S, "$_[0]\r\n", 0);
X}
X
Xsub close_server
X{
X	do put_server('QUIT');
X	do get_server();
X}
X
Xsub inet_addr
X{
X	@parts = split(/\./, $_[0]);
X	bit: {
X		if ($#parts == 0) { 
X			$val = $parts[0]; last bit; }
X		if ($#parts == 1) { 
X			$val = ($parts[0] << 24) | ($parts[1] & 0xffffff);
X			 last bit; }
X		if ($#parts == 2) { 
X               		$val = ($parts[0] << 24) | (($parts[1] & 0xff) << 16) |
X                        	($parts[2] & 0xffff);
X			last bit; }
X		if ($#parts == 3) { 
X                	$val = ($parts[0] << 24) | (($parts[1] & 0xff) << 16) |
X                      		(($parts[2] & 0xff) << 8) | ($parts[3] & 0xff);
X			last bit; }
X	}
X	pack("N", $val);
X}
END_OF_FILE
if test 3654 -ne `wc -c <'msgs2rc'`; then
    echo shar: \"'msgs2rc'\" unpacked with wrong size!
fi
chmod +x 'msgs2rc'
# end of 'msgs2rc'
fi
echo shar: End of shell archive.
exit 0