[comp.lang.perl] Sort newsrc file according to nn's presentation sequence

barrett@Daisy.EE.UND.AC.ZA (Alan P. Barrett) (12/17/90)

If you use nn and also another newsreader (I use both nn and trn) then
you may wish for a way of sorting your .newsrc file so that the other
newsreader uses approximately the same presentation sequence as nn does.
Here is a script that sorts a .newsrc file into the order specified by a
.nn/init file.  It works OK with my .newsrc and my .nn/init files; your
mileage may vary.

Even if you don't use nn, you might find this script useful; just fake
up a .nn/init file for it.  I hope that the summary included in the
source is sufficient to enable you to do this.

The script runs very slowly; perhaps one or more of the perl gurus can
suggest improvements.

--apb
Alan Barrett, Dept. of Electronic Eng., Univ. of Natal, Durban, South Africa
Internet: barrett@ee.und.ac.za (or %ee.und.ac.za@saqqara.cis.ohio-state.edu)
UUCP: m2xenix!quagga!undeed!barrett    PSI-Mail: PSI%(6550)13601353::BARRETT


# This is a shell archive.  Remove anything before this line,
# then unpack it by saving it in a file and typing "sh file".
#
# Wrapped by Alan P Barrett <barrett@undeed> on Sun Dec 16 23:58:24 1990
#
# This archive contains:
#	sort-newsrc	
#

LANG=""; export LANG
PATH=/bin:/usr/bin:$PATH; export PATH

echo x - sort-newsrc
sed 's/^@//' >sort-newsrc <<'@EOF'
#!/usr/bin/perl

# Sort a .newsrc file, according to sequence information like that
# used by nn.

# A. P. Barrett <barrett@ee.und.ac.za>, December 1990.

# Inputs:
# 1.  The old .newsrc file.  Presented on stdin.
# 2.  A .nn/init file, or reasonable facsimilie thereof.  Named
#     as a command line argument, or defaulted to $HOME/.nn/init
#     if that exists.
#
# Output:
#     A new .newsrc file.  Sent to stdout.
#
# Action:
# The .newsrc file is sorted according to the sequence information
# in the .nn/init file.  This will not necessarily be identical to the
# order used by nn, because we do not read the global nn/init file, and
# we make no attempt to deal with non-vanilla features of nninit files
# or newsrc files.
#
# Format of nninit file:
# The format is actually more complicated; this script supports only the
# following subset.
# 1. '#' comments work as expected.
# 2. Everything up to and including a line containing just the word
#    'sequence' is ignored.
# 3. Everything from the 'sequence' line to the end of the file defines
#    the presentation sequence.
# 4. A group or set of groups in included by listing its name.  'comp.unix'
#    matches just one group; 'comp.unix.' matches all subgroups of comp.unix,
#    but does not match the parent group; 'comp.unix*' matches both the
#    parent and subgroups.  '.test' matcehs all groups whose name ends with
#    .test.
# 5. A group is excluded by placing a '!' before its name.
# 6. A group is pushed to the start or end of the sequence by preceding it
#    with a '<' or a '>'.
# 7. A group that has been excluded may be reconsidered between a pair of
#    '%' signs.  Thus, '! comp.misc comp. % comp.misc %' will force
#    comp.misc to appear after all other comp groups.
# 8. Nn allows several groups may be combined by separating them with commas
#    instead of spaces.  This script treats commas in the same way as it
#    treats spaces.
# 9. Nn allows a group to have a save file and an entry sequence associated
#    with it.  This script ignores such options.

########################################################################

@@OLDNEWSRC = ();	# group names from old newscr file,
			# in the order that they were read.
			# (unused at present; when the RC flag in the nninit
			# file is implemented, this array will probably be
			# useful.)

%INFO = ();		# information about each group.  This is the line
			# from the newsrc file (apart from the group name
			# itself).

@@SELECTED = ();		# group names for new newsrc file,
			# in the order selected by the nninit file.

@@TAILEND = ();		# groups forced to end of sequence by '>' in nninit.

%UNTOUCHED = ();	# groups not yet either selected or deselected
			# by the nninit file.

%DESELECTED = ();	# groups deselected by '!' in nninit.

$debug = 0;		# debugging?

$NNINIT = $ENV{"HOME"}."/.nn/init";	# default nninit file

#
# Process command line args.
#
# This is probably one of the stupidest command line parsers you
# have seen for quite a while.
#

while ($_ = shift (@ARGV)) {
    if (/-v/) {
	$debug++;
    }
    else {
	$NNINIT = $_;
    }
}

#
# Read the .newsrc file from stdin.
#

while (<>) {
    # split into group name and other info
    ($group,$info) = /(.*)([!:].*)/;
    warn "read from newsrc: $group\n" if $debug >= 4;
    # remember the info (subscribed or unsubscribed; what has been read)
    $INFO{$group} = $info;
    # remember the order that the groups appeared in the old newsrc
    push (@OLDNEWSRC, $group);
    # the group is still waiting to be selected
    $UNTOUCHED{$group} = 1;
}

#
# Open, slurp and close nninit file
#

open (NNINIT, $NNINIT) || die "cannot open nn init file: $NNINIT: $!\n";
undef $/;	# get ready to slurp
$_ = <NNINIT>;	# slurp the file
close (NNINIT);	# hope there is no error here

#
# Now process the nninit file.
#

    undef $lastword;		# previous word
    $between_percents = 0;	# this is true inside a '% ... %' construct.

    s/#[^\n]*//g;			# ignore comments
    s/^(.*\n)*\s*sequence\s*\n//;	# ignore stuff before sequence line
    s/\(([^\)]|\\.)*\)//g;		# ignore entry action inside parentheses
					# (allow for \ escapes)
    s/,/ /g;				# treat commas like spaces
    s#\b[+/~]\S*##g;			# ignore save file names (words
					# starting with +, / or ~)
    s/!:\S+/!/;				# treat '!:stuff' like '!'
    s/([!%<>])/ $1 /;			# put spaces around special chars
					# to facilitate breaking into words
    s/^\s+//; s/\s+$//;			# kill leading and trailing blanks

    foreach $word (split (/\s+/,$_)) {	# process each word
	warn "word from nninit: $word\n" if $debug > 3;

	# '!', '<' and '>' must be followed by group name
	if (($lastword =~ /[!<>]/) && ($word =~ /[%!<>]/)) {
	    die "$NNINIT line $.: !, < or > must be followed by group name\n";
	}

	# '%' toggles state
	elsif ($word eq '%') {
	    $between_percents = ! $between_percents;
	}

	# '!', '<' and '>' just get remembered
	elsif ($word =~ /[!<>]/) {
	    # do nothing
	}

	# otherwise it must be a group name to be selected or deselected
	else {
	    @MATCHED = ();
	    # check the untouched groups
	    &find_matches (*UNTOUCHED, *MATCHED, $word);
	    # if between percents, also check deselected groups
	    &find_matches (*DESELECTED, *MATCHED, $word) if $between_percents;

	    if ($#MATCHED < 0) {
		warn "nothing matched: $word\n" if $debug;
	    }
	    # process the groups just matched
	    elsif ($lastword eq '!') {
		foreach $group (@MATCHED) {
		    $DESELECTED{$group} = 1;
		}
		warn "deselect: ".join(" ",sort(@MATCHED))."\n" if $debug;
	    }
	    elsif ($lastword eq '<') {
		unshift (@SELECTED, sort(@MATCHED));
		warn "force to start: ".join(" ",sort(@MATCHED))."\n" if $debug;
	    }
	    elsif ($lastword eq '>') {
		push (@TAILEND, sort(@MATCHED));
		warn "force to end: ".join(" ",sort(@MATCHED))."\n" if $debug;
	    }
	    else {
		push (@SELECTED, sort(@MATCHED));
		warn "select: ".join(" ",sort(@MATCHED))."\n" if $debug;
	    }
	}

	# remember the word just read
	$lastword = $word;

    } # process next word

#
# Now we can output the results.
# As well as the selected groups (from the @SELECTED and
# @TAILEND arrays), we also output deselected groups
# and untouched groups
#
warn "SELECTED: ".join(" ",@SELECTED)."\n" if $debug >= 2;
warn "TAILEND: ".join(" ",@TAILEND)."\n"
			if ($debug >= 2 && $#TAILEND >= 0);
warn "UNTOUCHED: ".join(" ",keys(%UNTOUCHED))."\n"
			if ($debug >= 1 && $#UNTOUCHED >= 0);
warn "DESELECTED: ".join(" ",keys(%DESELECTED))."\n"
			if ($debug >= 1 && $#DESELECTED >= 0);

foreach $group (@SELECTED, @TAILEND, keys(%UNTOUCHED), keys(%DESELECTED)) {
    print $group, $INFO{$group}, "\n";
}

#############

sub find_matches
#
# find all groups matching a pattern.  delete them from the input
# associative array and push them onto the output array.
# Returns number of matches.
#
{
    local (*IN, *OUT, $pattern) = @_;
    local ($count) = 0;

    # if the pattern is not a wildcard, quickly do what is required.
    if (($pattern !~ /^\./) && ($pattern !~ /[.*]$/)) {
	if (delete $IN{$pattern}) {
	    push (@OUT, $pattern);
	    $count++;
	    warn "matched: $pattern\n" if $debug >= 3;
	}
    }

    # if the pattern is a wildcard, check each group for a match
    else {
	$pattern =~ s/(\W)/\\$1/g;	# quote all special chars in the pattern
	$pattern = "^$pattern$";	# anchor the pattern
	$pattern =~ s/\\\*/.*/g;	# '*' becomes '.*'
	$pattern =~ s/\\\.\$$/\\./;	# '.' at end un-anchors pattern
	$pattern =~ s/^\^\\\./\\./;	# '.' at start un-anchors pattern
	$pattern =~ s/^\^\.\*//;	# '^.*' at start is useless
	$pattern =~ s/\.\*\$$//;	# '.*$' at end is useless
	warn "matching pattern: /$pattern/\n" if $debug >= 3;

	foreach $group (keys (%IN)) {
	    if ($group =~ /$pattern/) {
		delete $IN{$group};
		push (@OUT, $group);
		$count++;
		warn "matched: $group\n" if $debug >= 3;
	    }
	}
    }

    # return result
    $count;
}
@EOF

chmod 755 sort-newsrc

exit 0

rock@warp.Eng.Sun.COM (Bill Petro) (12/18/90)

barrett@Daisy.EE.UND.AC.ZA (Alan P. Barrett) writes:

>If you use nn and also another newsreader (I use both nn and trn) then
>you may wish for a way of sorting your .newsrc file so that the other
>newsreader uses approximately the same presentation sequence as nn does.
>Here is a script that sorts a .newsrc file into the order specified by a
>.nn/init file.  It works OK with my .newsrc and my .nn/init files; your
>mileage may vary.

I have had the problem (after running nntidy, I think) that running
rn or trn after nn causes rn to complain severely about the way that
.newsrc was left.  Error messages like:

Warning!  Somebody reset news.announce.important--assuming nothing read.
Unread news in news.announce.important    11 articles
Warning!  Somebody reset comp.binaries.mac--assuming nothing read.
Unread news in comp.binaries.mac     1036 articles

and on.  After taking care of this, and cleaning it up for rn,
if I go back to nn, it displays all my newsgroups as NEW.
Any ideas?


--
     Bill Petro  {decwrl,hplabs,ucbvax}!sun!Eng!rock
"UNIX for the sake of the kingdom of heaven"  Matthew 19:12