[comp.mail.misc] Perl version of from

jv@mh.nl (Johan Vromans) (12/22/89)

In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes:
   Here's a new version of from.sed, my sed script that does the job
   of from(1) better and faster.  It now truncates long subjects,
   correctly handles messages without subjects and From lines with %
   or @foo: routing.

   Yes, I tried writing this in Perl.  I'm not an expert Perl programmer,
   but I couldn't get it to run faster than about 70% slower than sed.

I've been using a perl version of 'from' for a long time, so I trow it
in. Features:
  - shortens the date, so there's more room for subject
  - shortens long subjects
  - uses "From: " headers if possible
  - provide "<none>" subject
  - automatic determination of system mailbox
  - maybe more
  - output sample:

  Nov 29 00:14 "jv           " Re: your mail through the list got here
  Nov 28 21:21 "David Dyck   " your mail through the list got here
  Nov 29 08:28 "Mark H. Colbu" Re: output compatibility

It runs about as fast as the sed version. Typical times for a large
mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.

------ begin of from -- ascii -- complete ------
#!/usr/bin/perl

# This program requires perl version 3.0, patchlevel 4 or higher

# Usage "from MAILBOX..."

# Don't forget: perl is a Practical Extract and Report Language!

format =
@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$date,        $from,         $subj
.

if ( $#ARGV < 0 ) {
  if ( ! ($user = getlogin)) {
    @a = getpwuid($<);
    $user = $a[0];
  }
  if ( -r "/usr/mail/$user" ) {
    @ARGV = ("/usr/mail/$user");
  }
  elsif ( -r "/usr/spool/mail" ) {
    @ARGV = ("/usr/spool/mail/$user");
  }
  else {
    printf STDERR "No mail for $user.\n";
    exit 1;
  }
}
  

# read through input file(s)
while ( $line = <> ) {
  chop ($line);

  # scan until "From_" header found
  next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
  $from = $1;  
  $date = $2;
  if ( $date eq "" || $from eq "" ) {
    print STDERR "Possible garbage: $line\n";
    next;
  }

  # get user name from uucp path
  $from = $1 if $from =~ /.*!(.+)/;

  # now, scan for Subject or empty line
  $subj = "";
  while ( $line = <> ) {
    chop ($line);

    if ( $line =~ /^$/ || $line =~ /^From / ) {
      # force fall-though
      $subj = "<none>" unless $subj;
    }
    else {
      $subj = $1 if $line =~ /^Subject\s*:\s*(.*)/i;
      if ( $line =~ /^From\s*:\s*/ ) {
        $line = $';
        if ( $line =~ /\((.+)\)/i ) { $from = $1; } 
        elsif ( $line =~ /^\s*(.+)\s*<.+>/i ) { $from = $1; } 
        elsif ( $line =~ /^<.+>\s*(.+)/i ) { $from = $1; } 
      }
    }

    # do we have enough info?
    if ( $from && $subj ) {
      write;
      last;
    }
  }
}
------ end of from -- ascii -- complete ------

Have fun,
Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

john@trigraph.uucp (John Chew) (12/30/89)

In article <1989Dec20.222732.5633@trigraph.uucp> 
  I posted a sed script that does the job of from(1).

In <JV.89Dec21221143@mhres.mh.nl> Johan Vromans <jv@mh.nl> 
  posted a perl script that does the same thing.

I've tried both out on various mailboxes and have come to the
following conclusions:

1. On small mailboxes, the compilation-time overhead of perl makes it a pig.

2. On large mailboxes, especially those containing long messages, perl
   can catch up to sed.

3. The following patch to Johan Vromans' perl script speeds it up by
   as much as 30% on large files, by tightening the search-for-From_
   loop.

*** old/from.jv.pl	Fri Dec 29 12:05:15 1989
--- from.jv.pl	Fri Dec 29 11:42:13 1989
***************
*** 30,40
    
  
  # read through input file(s)
! while ( $line = <> ) {
!   chop ($line);
! 
!   # scan until "From_" header found
!   next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {

--- 30,39 -----
    
  
  # read through input file(s)
! while (<>) {
!   next unless /^From /;
!   chop;
!   next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
***************
*** 38,44
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
!     print STDERR "Possible garbage: $line\n";
      next;
    }
  

--- 37,43 -----
    $from = $1;  
    $date = $2;
    if ( $date eq "" || $from eq "" ) {
!     print STDERR "Possible garbage: $_\n";
      next;
    }
  
I'll keep both scripts around for now.  I actually prefer the notion
of writing such things in perl, but when your mail machine is a heavily-
used VAX-11/750 you can't afford luxuries....

John
-- 
john j. chew, iii   		  phone: +1 416 425 3818     AppleLink: CDA0329
trigraph, inc., toronto, canada   {uunet!utai!utcsri,utgpu,utzoo}!trigraph!john
dept. of math., u. of toronto     poslfit@{utorgpu.bitnet,gpu.utcs.utoronto.ca}