hhg1@gte.com (Hallett German) (05/02/91)
am using perl 3.041 under Ultrix.
The program produces an output file listing the messages received for
a given newsgroup. Split works fine for a few hundred records but not
a few thousand in that the c++ newsgroups are not included. (A kludge
is done to get by grep). How can I get it to work correctly:
hhg1@gte.com:
####################################################################
#Input file generator -- generates a file of news groups and last
# message obtained.
####################################################################
##################################################
# set up definitions #
##################################################
system("date");
$file1="/usr3/hhg1/xaa"; #Copied version of /usr/spool/batch/headers
$file2="/usr/tmp/messfile"; #Input file for news digester--Result
$file3="/var\/spool\/news\/";
##################################################
#open files, initialize variables, array #
##################################################
open(FILE1,"<$file1")||
die "$file1 not opened $!\n" ;
$cnt =0;
%MARRAY=();
##################################################
# Slosh through raw file and produce an #
# array of groups and message numbers #
# TRICK: append cnt to group name so unique key #
# and grep works correctly #
##################################################
while (<FILE1>) {
chop;
s/$file3//i;
$a=$_;
($a2=$a) =~ tr/\//\./; #translate / to .
$len = length($a2);
$off = rindex($a2,".")+1; #find last period
$len2 = ($len-($len-$off)-1);
$messnum = substr($a2,$off,$len-$off); #save message number
$ngroup = substr($a2,0,$len2)."#".$cnt;
if (index($ngroup,"c++") >= 0) {
($ngroup) =~ s/c\++/cplusplus/;
}
$MARRAY{$ngroup} = $messnum;
$cnt++;
}
##################################################
# Open output file and capture groups #
##################################################
open(FILE2,">$file2")||
die "file2 not opened $!\n" ;
$gcnt=1;
@ndx = keys(%MARRAY);
@fm = sort(keys %MARRAY);
##################################################
# Recapture group and message number and #
# write to output file #
##################################################
foreach $fld (@fm) {
if ($gcnt eq 1) {#first time encountering group
($groupn, $crapp) = split(/\#/,$fld,2);
$acnt=grep(/$groupn/,@ndx);
$start=$MARRAY{$fld};
if (index($groupn,"comp.lang") >= 0)
{print STDOUT "$groupn -- in loop=1 \n";}
}
if (($acnt eq 1) || ($acnt eq $gcnt)){#if last instance of group
if (index($groupn,"comp.lang") >= 0)
{print STDOUT "$groupn --second loop \n";}
if (index($groupn,"plusplus") >= 0) {
print STDOUT "in loop \n";
($groupn) =~ s/cplusplus/c\++/;} #kludge for c++ group
if ($start <= $MARRAY{$fld}) {
print FILE2 $groupn," ",$start," ",$MARRAY{$fld}," \n";
}
if ($start > $MARRAY{$fld}) {
print FILE2 $groupn," ",$MARRAY{$fld}," ",$start," \n";
}
$gcnt=1;
next;
}
$gcnt++;
}
system("date");lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/03/91)
In article <11112@bunny.GTE.COM> hhg1@gte.com (Hallett German) writes:
:
: am using perl 3.041 under Ultrix.
:
: The program produces an output file listing the messages received for
: a given newsgroup. Split works fine for a few hundred records but not
: a few thousand in that the c++ newsgroups are not included. (A kludge
: is done to get by grep). How can I get it to work correctly:
I vaguely recall having fixed some bug like this having to do with
splitting on #--you might try an up-to-date version.
: if (index($ngroup,"c++") >= 0) {
: ($ngroup) =~ s/c\++/cplusplus/;
: }
Nitpick. The index() is superfluous if you quote the pattern right:
$ngroup =~ s/c\+\+/cplusplus/;
Larry