[comp.lang.perl] Split problem

hhg1@gte.com (Hallett German) (05/02/91)

 am using perl 3.041 under Ultrix.

  The program produces an output file listing the messages received for
  a given newsgroup. Split works fine for a few hundred records but not
  a few thousand in that the c++ newsgroups are not included. (A kludge
  is done to get by grep). How can I get it to work correctly:

hhg1@gte.com:
####################################################################
#Input file generator -- generates a file of news groups and last
#                        message obtained.
####################################################################

##################################################
# set up definitions                             #
##################################################
system("date");
$file1="/usr3/hhg1/xaa";  #Copied version of /usr/spool/batch/headers
$file2="/usr/tmp/messfile"; #Input file for news digester--Result 
$file3="/var\/spool\/news\/";
##################################################
#open files, initialize variables, array         #
##################################################
open(FILE1,"<$file1")|| 
    die "$file1 not opened $!\n" ;
    $cnt   =0;
    %MARRAY=();

##################################################
# Slosh through raw file and produce an          #
#  array of groups and message numbers           #
#  TRICK: append cnt to group name so unique key #
#         and grep works correctly               #
##################################################
while (<FILE1>) {
     chop;
      s/$file3//i;
     $a=$_;
     ($a2=$a)     =~  tr/\//\./;                  #translate / to .
     $len         =   length($a2);
     $off         =   rindex($a2,".")+1;          #find last period
     $len2        =   ($len-($len-$off)-1);
     $messnum     =   substr($a2,$off,$len-$off); #save message number
     $ngroup      =   substr($a2,0,$len2)."#".$cnt;
     if (index($ngroup,"c++") >= 0) {
        ($ngroup) =~ s/c\++/cplusplus/;
      }
     $MARRAY{$ngroup} = $messnum;
     $cnt++;
}
##################################################
#     Open output file and capture groups        #
##################################################
open(FILE2,">$file2")||
    die "file2 not opened $!\n" ;
    $gcnt=1;
    @ndx = keys(%MARRAY);
    @fm  = sort(keys %MARRAY);
##################################################
#     Recapture group and message number and     #
#      write to output file                      #
##################################################
foreach $fld (@fm) {
     if ($gcnt eq 1) {#first time encountering group
          ($groupn, $crapp) = split(/\#/,$fld,2);
          $acnt=grep(/$groupn/,@ndx);
          $start=$MARRAY{$fld};
          if (index($groupn,"comp.lang") >= 0) 
{print STDOUT "$groupn -- in loop=1 \n";}
     }
     if (($acnt eq 1) || ($acnt eq $gcnt)){#if last instance of group
          if (index($groupn,"comp.lang") >= 0) 
             {print STDOUT "$groupn --second loop \n";}
          if (index($groupn,"plusplus") >= 0) { 
              print STDOUT "in loop \n";
              ($groupn) =~ s/cplusplus/c\++/;} #kludge for c++ group 
            
            if ($start <= $MARRAY{$fld}) {
               print FILE2 $groupn," ",$start," ",$MARRAY{$fld}," \n";
            }
            if ($start > $MARRAY{$fld}) {
               print FILE2 $groupn," ",$MARRAY{$fld}," ",$start," \n";
            }
          $gcnt=1;
          next;
     }
     $gcnt++;
}
system("date");

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/03/91)

In article <11112@bunny.GTE.COM> hhg1@gte.com (Hallett German) writes:
: 
:  am using perl 3.041 under Ultrix.
: 
:   The program produces an output file listing the messages received for
:   a given newsgroup. Split works fine for a few hundred records but not
:   a few thousand in that the c++ newsgroups are not included. (A kludge
:   is done to get by grep). How can I get it to work correctly:

I vaguely recall having fixed some bug like this having to do with
splitting on #--you might try an up-to-date version.

:      if (index($ngroup,"c++") >= 0) {
:         ($ngroup) =~ s/c\++/cplusplus/;
:       }

Nitpick.  The index() is superfluous if you quote the pattern right:

	$ngroup =~ s/c\+\+/cplusplus/;

Larry