[alt.sources] perl version of from

jv@mh.nl (Johan Vromans) (12/22/89)

In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes:
   Here's a new version of from.sed, my sed script that does the job
   of from(1) better and faster.  It now truncates long subjects,
   correctly handles messages without subjects and From lines with %
   or @foo: routing.

   Yes, I tried writing this in Perl.  I'm not an expert Perl programmer,
   but I couldn't get it to run faster than about 70% slower than sed.

I've been using a perl version of 'from' for a long time, so I trow it
in. Features:
  - shortens the date, so there's more room for subject
  - shortens long subjects
  - uses "From: " headers if possible
  - provide "<none>" subject
  - automatic determination of system mailbox
  - maybe more
  - output sample:

  Nov 29 00:14 "jv           " Re: your mail through the list got here
  Nov 28 21:21 "David Dyck   " your mail through the list got here
  Nov 29 08:28 "Mark H. Colbu" Re: output compatibility

It runs about as fast as the sed version. Typical times for a large
mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.

------ begin of from -- ascii -- complete ------
#!/usr/bin/perl

# This program requires perl version 3.0, patchlevel 4 or higher

# Usage "from MAILBOX..."

# Don't forget: perl is a Practical Extract and Report Language!

format =
@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$date,        $from,         $subj
.

if ( $#ARGV < 0 ) {
  if ( ! ($user = getlogin)) {
    @a = getpwuid($<);
    $user = $a[0];
  }
  if ( -r "/usr/mail/$user" ) {
    @ARGV = ("/usr/mail/$user");
  }
  elsif ( -r "/usr/spool/mail" ) {
    @ARGV = ("/usr/spool/mail/$user");
  }
  else {
    printf STDERR "No mail for $user.\n";
    exit 1;
  }
}
  

# read through input file(s)
while ( $line = <> ) {
  chop ($line);

  # scan until "From_" header found
  next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
  $from = $1;  
  $date = $2;
  if ( $date eq "" || $from eq "" ) {
    print STDERR "Possible garbage: $line\n";
    next;
  }

  # get user name from uucp path
  $from = $1 if $from =~ /.*!(.+)/;

  # now, scan for Subject or empty line
  $subj = "";
  while ( $line = <> ) {
    chop ($line);

    if ( $line =~ /^$/ || $line =~ /^From / ) {
      # force fall-though
      $subj = "<none>" unless $subj;
    }
    else {
      $subj = $1 if $line =~ /^Subject\s*:\s*(.*)/i;
      if ( $line =~ /^From\s*:\s*/ ) {
        $line = $';
        if ( $line =~ /\((.+)\)/i ) { $from = $1; } 
        elsif ( $line =~ /^\s*(.+)\s*<.+>/i ) { $from = $1; } 
        elsif ( $line =~ /^<.+>\s*(.+)/i ) { $from = $1; } 
      }
    }

    # do we have enough info?
    if ( $from && $subj ) {
      write;
      last;
    }
  }
}
------ end of from -- ascii -- complete ------

Have fun,
Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

jv@mh.nl (Johan Vromans) (01/04/90)

Note: I have redirected follow-ups to comp.lang.perl.

In article <JV.89Dec21221143@mhres.mh.nl> jv@mh.nl (Johan Vromans) writes:

| In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes:
|    Here's a new version of from.sed, my sed script that does the job
|    of from(1) better and faster.  It now truncates long subjects,
|    correctly handles messages without subjects and From lines with %
|    or @foo: routing.
| 
|    Yes, I tried writing this in Perl.  I'm not an expert Perl programmer,
|    but I couldn't get it to run faster than about 70% slower than sed.

To which I replied:

| I've been using a perl version of 'from' for a long time, so I trow it
| in. [...]

| It runs about as fast as the sed version. Typical times for a large
| mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl.

Script fragment:

  while ( $line = <> ) {
    chop ($line);
    # scan until "From_" header found
    next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

I was pointed out by John J. Chew <poslfit@gpu.UTCS.UToronto.CA> that
tightening the search for "From " would speed up the program by 30%.
He suggested:

  while ( <> ) {
    next unless /^From /;
    chop ($line);
    next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

Well, I tried it, and -NOT to my surprise- I found out that the major
speedup is caused by leaving out the assignment to the variable $line
and postponing the chop. I couldn't imagine (knowing how Larry likes
optimisation) that

  next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

would take more time to fail than

  next unless /^From /;

With the speedups, the perl script beats the sed script on both large
and small mailboxes:

~ > wc -lc INBOX
    163   6927 INBOX
~ > dotime 5 perl src/perl.pl INBOX
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real   0.2     0.4   0.2   0.2   0.2   0.2
user   0.0     0.0   0.0   0.0   0.0   0.0
sys    0.1     0.1   0.1   0.1   0.1   0.1
~ > dotime 5 sed -f from.sed INBOX
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real   0.5     0.7   0.4   0.5   0.4   0.4
user   0.1     0.1   0.1   0.1   0.1   0.1
sys    0.2     0.2   0.2   0.3   0.2   0.2
~ > wc -lc maildir/pax
  46585 1240000 maildir/pax
~ > dotime 5 perl src/from.pl maildir/pax
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real  21.9    21.9  20.3  21.1  25.7  20.7
user  14.0    14.4  14.3  14.1  13.7  13.6
sys    5.9     5.8   4.9   5.7   7.4   5.9
~ > dotime 5 sed -f from.sed maildir/pax
      Avg  Pass 1     2     3     4     5
     ----- ------- ----- ----- ----- -----
real  23.1    23.4  22.7  22.9  23.1  23.5
user  14.8    14.8  14.9  14.8  14.3  15.2
sys    7.4     7.4   7.1   7.3   7.8   7.2

I have posted the "dotime" program to alt.sources, for whoever thinks
she/he can use it.

Have fun!

Johan
--
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62944/62500
------------------------ "Arms are made for hugging" -------------------------

subbarao@phoenix.Princeton.EDU (Kartik Subbarao) (05/20/91)

Anyone know of a cooler way to do this?
---------

#! /usr/princeton/bin/perl

open(mail, $ENV{'MAIL'}) || exit;

while (<mail>) {
    if (/^From /) {      # Note this is different from From:
        chop ($out = $_); print $out;
        while (<mail>) {
            if (/^$/) { 
                print "\n"; 
                last; 
            }
            if (/^Subject:/) {
                chop;
                @subj = split(/[: ]/,$_);
                print ' "'.$subj[2].'"'."\n";
                last;
            }
        }
    }
}
--
internet% ypwhich

subbarao@phoenix.Princeton.EDU -| Internet
kartik@silvertone.Princeton.EDU (NeXT mail)  
SUBBARAO@PUCC.BITNET			          - Bitnet

jv@mh.nl (Johan Vromans) (05/20/91)

> Anyone know of a cooler way to do this?

Depends on what you call 'cool'	:-) .

#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 05/20/1991 10:09 UTC by jv@pronto
# Source directory /u1/users/jv
#
# existing files will NOT be overwritten unless -c is specified
#
# This shar contains:
# length  mode       name
# ------ ---------- ------------------------------------------
#   2214 -rwxr-xr-x from
#
# ============= from ==============
if test -f 'from' -a X"$1" != X"-c"; then
	echo 'x - skipping from (File already exists)'
else
echo 'x - extracting from (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'from' &&
X#!/usr/bin/perl
X
X# This program requires perl version 3.0, patchlevel 4 or higher
X
X# @($)@ from	1.5 - from.pl
X
X# Show messages from a Unix mailbox. With -n: shown message numbers also.
X#
X# Usage "from [-n] MAILBOX..."
X#
X# Don't forget: perl is a Practical Extract and Report Language!
X#
X# Copyright 1989,1990 Johan Vromans <jv@mh.nl>, no rights reserved.
X# Usage and redistribution is free and encouraged.
X
X# Default output format
Xformat =
X@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$date,        $from,         $subj
X.
X
X# Output format when sequence numbers are requested
Xformat format_n =
X@>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$seq, $date,      $from,         $subj
X.
X
X# Parse and stash away -n switch, if provided
Xif ($#ARGV >= 0 && $ARGV[0] eq '-n') {
X  shift (@ARGV);
X  $~ = "format_n";
X}
X
X# Use system mailbox if none was specified on the command line
Xif ( $#ARGV < 0 ) {
X  if ( ! ($user = getlogin)) {
X    @a = getpwuid($<);
X    $user = $a[0];
X  }
X  if ( -r "/usr/mail/$user" ) {		# System V
X    @ARGV = ("/usr/mail/$user");
X  }
X  elsif ( -r "/usr/spool/mail" ) {	# BSD
X    @ARGV = ("/usr/spool/mail/$user");
X  }
X  else {
X    printf STDERR "No mail for $user.\n";
X    exit 1;
X  }
X}
X  
X$seq = 0;
X# Read through input file(s)
Xwhile (<>) {
X
X  # Look for a "From_" header (See RFC822 and associated documents).
X  next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
X
X  chop;
X  $from = $1;  
X  $date = $2;
X  if ( $date eq "" || $from eq "" ) {
X    print STDERR "Possible garbage: $_\n";
X    next;
X  }
X
X  $seq++;
X  # Get user name from uucp path
X  $from = $1 if $from =~ /.*!(.+)/;
X
X  # Now, scan for Subject or empty line
X  $subj = "";
X  while ( <> ) {
X    chop ($_);
X
X    if ( /^$/ || /^From / ) {
X      # force fall-though
X      $subj = "<none>" unless $subj;
X    }
X    else {
X      $subj = $1 if /^Subject\s*:\s*(.*)/i;
X      if ( /^From\s*:\s*/ ) {
X        $_ = $';
X        if ( /\((.+)\)/i ) { $from = $1; } 
X        elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; } 
X        elsif ( /^<.+>\s*(.+)/i ) { $from = $1; } 
X      }
X    }
X
X    # do we have enough info?
X    if ( $from && $subj ) {
X      write;
X      last;
X    }
X  }
X}
SHAR_EOF
chmod 0755 from ||
echo 'restore of from failed'
Wc_c="`wc -c < 'from'`"
test 2214 -eq "$Wc_c" ||
	echo 'from: original size 2214, current size' "$Wc_c"
fi
exit 0
-- 
Johan Vromans				       jv@mh.nl via internet backbones
Multihouse Automatisering bv		       uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands  phone/fax: +31 1820 62911/62500
------------------------ "Arms are made for hugging" -------------------------

rearl@watnxt3.ucr.edu (Robert Earl) (05/20/91)

In article <azioL/5ej/g.c@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes:

|   Anyone know of a cooler way to do this?

It's Perl, there's ALWAYS a cooler way.  :-)

|		   @subj = split(/[: ]/,$_);
|		   print ' "'.$subj[2].'"'."\n";

This was sort of confusing, did you really want it to print just the
first word of a subject?  Mine won't do that, but it's easy enough to
change.

#! /usr/bin/perl

open(mail, $ENV{'MAIL'}) || exit;

while (<mail>) {
    # skip this block unless it's "From ", else print it
    /^From / || next; chop; print;
    while (<mail>) {
	chop;

	# for the other way, substitute this line:
	# ((length) ? s/^Subject: *([^: ]+)/print qq| "$1"\n|/e : print "\n")

	((length) ? s/^Subject: */print qq| "$'"\n|/e : print "\n")
	    && last;
    }
}
print "\n";


print "Just another Perl hacker, ";
--
______________________________________________________________________
			\					
 robert earl		/	"Love is a many splintered thing"
 rearl@watnxt3.ucr.edu	\		--Sisters of Mercy
 rearl@gnu.ai.mit.edu	/

gamin@ireq-robot.hydro.qc.ca (Martin Boyer) (05/22/91)

jv@mh.nl (Johan Vromans) writes:

   > From: subbarao@phoenix.Princeton.EDU (Kartik Subbarao)
   > Subject: perl version of from(1)
   > Date: 19 May 91 23:54:24 GMT
   > Organization: American Chemical Society

   > Anyone know of a cooler way to do this?

   Depends on what you call 'cool'	:-) .

   [shar of from.pl 1.5 deleted]

Here's my version modified to:

    handle multiple mailboxes (default is $HOME/INBOX, created by vm)
    reset message counter for each mailbox
    test for presence of mailbox instead of spool directory in BSD systems


#!/usr/bin/perl
#----Le laboratoire de robotique de l'Institut de recherche d'Hydro-Quebec-----
# 
# Nom     : from [-n] [MAILBOX] [MAILBOX...]
# Fonction: Imprime un sommaire de la boite postale,
#	    avec -n, imprime les numeros de message aussi.
# Fichiers: from, /usr/spool/mail, ~/INBOX
# Notes   : Requiert perl version 3.0, patchlevel 4 ou mieux.
# 
# Cree    :    1989  ------------ Johan Vromans <jv@mh.nl>
# Modifie : 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca>
#	    Copyright 1989, 1990 Johan Vromans <jv@mh.nl>, no rights reserved.
# 
# Historique: 
# 
# 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca>
# 	Modifie la version 1.5 de Johan Vromans: ajoute INBOX
# 	et remis le compteur de message a zero pour chaque fichier.
#------------------------------------------------------------------------------

$inbox = "$ENV{'HOME'}/INBOX";

# Default output format
format =
@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$date,        $from,         $subj
.

# Output format when sequence numbers are requested
format format_n =
@>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
$seq, $date,      $from,         $subj
.

# Parse and stash away -n switch, if provided
if ($#ARGV >= 0 && $ARGV[0] eq '-n') {
  shift (@ARGV);
  $~ = "format_n";
}

# Use system mailbox and user INBOX 
# if none was specified on the command line
if ( $#ARGV < 0 ) {
    if ( ! ($user = getlogin)) {
	@a = getpwuid($<);
	$user = $a[0];
    }
    if ( -r "/usr/mail/$user" ) {	    # System V
	@ARGV = ("/usr/mail/$user");
    }
    elsif ( -r "/usr/spool/mail/$user" ) {  # BSD
	@ARGV = ("/usr/spool/mail/$user");
    }
    if ( $inbox && -r $inbox ) {
	@ARGV = ($inbox, @ARGV);
    }
    if ($#ARGV < 0)  {
	printf STDERR "No mail for $user.\n";
	exit 1;
    }
}
if ( $#ARGV < 0 ) {
  if ( ! ($user = getlogin)) {
    @a = getpwuid($<);
    $user = $a[0];
  }
  if ( -r "/usr/mail/$user" ) {		
    @ARGV = ("/usr/mail/$user");
  }
  elsif ( -r "/usr/spool/mail/$user" ) {	# BSD
    @ARGV = ("/usr/spool/mail/$user");
  }
  else {
    printf STDERR "No mail for $user.\n";
    exit 1;
  }
}
  
# Read through input file(s)
while (<>) {

  if (eof) {$seq = 0;}

  # Look for a "From_" header (See RFC822 and associated documents).
  next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;

  chop;
  $from = $1;  
  $date = $2;
  if ( $date eq "" || $from eq "" ) {
    print STDERR "Possible garbage: $_\n";
    next;
  }

  $seq++;
  # Get user name from uucp path
  $from = $1 if $from =~ /.*!(.+)/;

  # Now, scan for Subject or empty line
  $subj = "";
  while ( <> ) {
    chop ($_);

    if ( /^$/ || /^From / ) {
      # force fall-though
      $subj = "<none>" unless $subj;
    }
    else {
      $subj = $1 if /^Subject\s*:\s*(.*)/i;
      if ( /^From\s*:\s*/ ) {
        $_ = $';
        if ( /\((.+)\)/i ) { $from = $1; } 
        elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; } 
        elsif ( /^<.+>\s*(.+)/i ) { $from = $1; } 
      }
    }

    # do we have enough info?
    if ( $from && $subj ) {
      write;
      last;
    }
  }
}

--
Martin Boyer                            mboyer@ireq-robot.hydro.qc.ca
Institut de recherche d'Hydro-Quebec    mboyer@ireq-robot.uucp
Varennes, QC, Canada   J3X 1S1
+1 514 652-8412