hhg1@gte.com (Hallett German) (05/02/91)
am using perl 3.041 under Ultrix. The program produces an output file listing the messages received for a given newsgroup. Split works fine for a few hundred records but not a few thousand in that the c++ newsgroups are not included. (A kludge is done to get by grep). How can I get it to work correctly: hhg1@gte.com: #################################################################### #Input file generator -- generates a file of news groups and last # message obtained. #################################################################### ################################################## # set up definitions # ################################################## system("date"); $file1="/usr3/hhg1/xaa"; #Copied version of /usr/spool/batch/headers $file2="/usr/tmp/messfile"; #Input file for news digester--Result $file3="/var\/spool\/news\/"; ################################################## #open files, initialize variables, array # ################################################## open(FILE1,"<$file1")|| die "$file1 not opened $!\n" ; $cnt =0; %MARRAY=(); ################################################## # Slosh through raw file and produce an # # array of groups and message numbers # # TRICK: append cnt to group name so unique key # # and grep works correctly # ################################################## while (<FILE1>) { chop; s/$file3//i; $a=$_; ($a2=$a) =~ tr/\//\./; #translate / to . $len = length($a2); $off = rindex($a2,".")+1; #find last period $len2 = ($len-($len-$off)-1); $messnum = substr($a2,$off,$len-$off); #save message number $ngroup = substr($a2,0,$len2)."#".$cnt; if (index($ngroup,"c++") >= 0) { ($ngroup) =~ s/c\++/cplusplus/; } $MARRAY{$ngroup} = $messnum; $cnt++; } ################################################## # Open output file and capture groups # ################################################## open(FILE2,">$file2")|| die "file2 not opened $!\n" ; $gcnt=1; @ndx = keys(%MARRAY); @fm = sort(keys %MARRAY); ################################################## # Recapture group and message number and # # write to output file # ################################################## foreach $fld (@fm) { if ($gcnt eq 1) {#first time encountering group ($groupn, $crapp) = split(/\#/,$fld,2); $acnt=grep(/$groupn/,@ndx); $start=$MARRAY{$fld}; if (index($groupn,"comp.lang") >= 0) {print STDOUT "$groupn -- in loop=1 \n";} } if (($acnt eq 1) || ($acnt eq $gcnt)){#if last instance of group if (index($groupn,"comp.lang") >= 0) {print STDOUT "$groupn --second loop \n";} if (index($groupn,"plusplus") >= 0) { print STDOUT "in loop \n"; ($groupn) =~ s/cplusplus/c\++/;} #kludge for c++ group if ($start <= $MARRAY{$fld}) { print FILE2 $groupn," ",$start," ",$MARRAY{$fld}," \n"; } if ($start > $MARRAY{$fld}) { print FILE2 $groupn," ",$MARRAY{$fld}," ",$start," \n"; } $gcnt=1; next; } $gcnt++; } system("date");
lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/03/91)
In article <11112@bunny.GTE.COM> hhg1@gte.com (Hallett German) writes:
:
: am using perl 3.041 under Ultrix.
:
: The program produces an output file listing the messages received for
: a given newsgroup. Split works fine for a few hundred records but not
: a few thousand in that the c++ newsgroups are not included. (A kludge
: is done to get by grep). How can I get it to work correctly:
I vaguely recall having fixed some bug like this having to do with
splitting on #--you might try an up-to-date version.
: if (index($ngroup,"c++") >= 0) {
: ($ngroup) =~ s/c\++/cplusplus/;
: }
Nitpick. The index() is superfluous if you quote the pattern right:
$ngroup =~ s/c\+\+/cplusplus/;
Larry