jv@mh.nl (Johan Vromans) (01/04/90)
Note: I have redirected follow-ups to comp.lang.perl. In article <JV.89Dec21221143@mhres.mh.nl> jv@mh.nl (Johan Vromans) writes: | In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes: | Here's a new version of from.sed, my sed script that does the job | of from(1) better and faster. It now truncates long subjects, | correctly handles messages without subjects and From lines with % | or @foo: routing. | | Yes, I tried writing this in Perl. I'm not an expert Perl programmer, | but I couldn't get it to run faster than about 70% slower than sed. To which I replied: | I've been using a perl version of 'from' for a long time, so I trow it | in. [...] | It runs about as fast as the sed version. Typical times for a large | mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl. Script fragment: while ( $line = <> ) { chop ($line); # scan until "From_" header found next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; I was pointed out by John J. Chew <poslfit@gpu.UTCS.UToronto.CA> that tightening the search for "From " would speed up the program by 30%. He suggested: while ( <> ) { next unless /^From /; chop ($line); next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; Well, I tried it, and -NOT to my surprise- I found out that the major speedup is caused by leaving out the assignment to the variable $line and postponing the chop. I couldn't imagine (knowing how Larry likes optimisation) that next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; would take more time to fail than next unless /^From /; With the speedups, the perl script beats the sed script on both large and small mailboxes: ~ > wc -lc INBOX 163 6927 INBOX ~ > dotime 5 perl src/perl.pl INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.2 0.4 0.2 0.2 0.2 0.2 user 0.0 0.0 0.0 0.0 0.0 0.0 sys 0.1 0.1 0.1 0.1 0.1 0.1 ~ > dotime 5 sed -f from.sed INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.5 0.7 0.4 0.5 0.4 0.4 user 0.1 0.1 0.1 0.1 0.1 0.1 sys 0.2 0.2 0.2 0.3 0.2 0.2 ~ > wc -lc maildir/pax 46585 1240000 maildir/pax ~ > dotime 5 perl src/from.pl maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 21.9 21.9 20.3 21.1 25.7 20.7 user 14.0 14.4 14.3 14.1 13.7 13.6 sys 5.9 5.8 4.9 5.7 7.4 5.9 ~ > dotime 5 sed -f from.sed maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 23.1 23.4 22.7 22.9 23.1 23.5 user 14.8 14.8 14.9 14.8 14.3 15.2 sys 7.4 7.4 7.1 7.3 7.8 7.2 I have posted the "dotime" program to alt.sources, for whoever thinks she/he can use it. Have fun! Johan -- Johan Vromans jv@mh.nl via internet backbones Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500 ------------------------ "Arms are made for hugging" -------------------------
subbarao@phoenix.Princeton.EDU (Kartik Subbarao) (05/20/91)
Anyone know of a cooler way to do this? --------- #! /usr/princeton/bin/perl open(mail, $ENV{'MAIL'}) || exit; while (<mail>) { if (/^From /) { # Note this is different from From: chop ($out = $_); print $out; while (<mail>) { if (/^$/) { print "\n"; last; } if (/^Subject:/) { chop; @subj = split(/[: ]/,$_); print ' "'.$subj[2].'"'."\n"; last; } } } } -- internet% ypwhich subbarao@phoenix.Princeton.EDU -| Internet kartik@silvertone.Princeton.EDU (NeXT mail) SUBBARAO@PUCC.BITNET - Bitnet
jv@mh.nl (Johan Vromans) (05/20/91)
> Anyone know of a cooler way to do this?
Depends on what you call 'cool' :-) .
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 05/20/1991 10:09 UTC by jv@pronto
# Source directory /u1/users/jv
#
# existing files will NOT be overwritten unless -c is specified
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 2214 -rwxr-xr-x from
#
# ============= from ==============
if test -f 'from' -a X"$1" != X"-c"; then
echo 'x - skipping from (File already exists)'
else
echo 'x - extracting from (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'from' &&
X#!/usr/bin/perl
X
X# This program requires perl version 3.0, patchlevel 4 or higher
X
X# @($)@ from 1.5 - from.pl
X
X# Show messages from a Unix mailbox. With -n: shown message numbers also.
X#
X# Usage "from [-n] MAILBOX..."
X#
X# Don't forget: perl is a Practical Extract and Report Language!
X#
X# Copyright 1989,1990 Johan Vromans <jv@mh.nl>, no rights reserved.
X# Usage and redistribution is free and encouraged.
X
X# Default output format
Xformat =
X@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$date, $from, $subj
X.
X
X# Output format when sequence numbers are requested
Xformat format_n =
X@>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$seq, $date, $from, $subj
X.
X
X# Parse and stash away -n switch, if provided
Xif ($#ARGV >= 0 && $ARGV[0] eq '-n') {
X shift (@ARGV);
X $~ = "format_n";
X}
X
X# Use system mailbox if none was specified on the command line
Xif ( $#ARGV < 0 ) {
X if ( ! ($user = getlogin)) {
X @a = getpwuid($<);
X $user = $a[0];
X }
X if ( -r "/usr/mail/$user" ) { # System V
X @ARGV = ("/usr/mail/$user");
X }
X elsif ( -r "/usr/spool/mail" ) { # BSD
X @ARGV = ("/usr/spool/mail/$user");
X }
X else {
X printf STDERR "No mail for $user.\n";
X exit 1;
X }
X}
X
X$seq = 0;
X# Read through input file(s)
Xwhile (<>) {
X
X # Look for a "From_" header (See RFC822 and associated documents).
X next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
X
X chop;
X $from = $1;
X $date = $2;
X if ( $date eq "" || $from eq "" ) {
X print STDERR "Possible garbage: $_\n";
X next;
X }
X
X $seq++;
X # Get user name from uucp path
X $from = $1 if $from =~ /.*!(.+)/;
X
X # Now, scan for Subject or empty line
X $subj = "";
X while ( <> ) {
X chop ($_);
X
X if ( /^$/ || /^From / ) {
X # force fall-though
X $subj = "<none>" unless $subj;
X }
X else {
X $subj = $1 if /^Subject\s*:\s*(.*)/i;
X if ( /^From\s*:\s*/ ) {
X $_ = $';
X if ( /\((.+)\)/i ) { $from = $1; }
X elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; }
X elsif ( /^<.+>\s*(.+)/i ) { $from = $1; }
X }
X }
X
X # do we have enough info?
X if ( $from && $subj ) {
X write;
X last;
X }
X }
X}
SHAR_EOF
chmod 0755 from ||
echo 'restore of from failed'
Wc_c="`wc -c < 'from'`"
test 2214 -eq "$Wc_c" ||
echo 'from: original size 2214, current size' "$Wc_c"
fi
exit 0
--
Johan Vromans jv@mh.nl via internet backbones
Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62911/62500
------------------------ "Arms are made for hugging" -------------------------
rearl@watnxt3.ucr.edu (Robert Earl) (05/20/91)
In article <azioL/5ej/g.c@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes: | Anyone know of a cooler way to do this? It's Perl, there's ALWAYS a cooler way. :-) | @subj = split(/[: ]/,$_); | print ' "'.$subj[2].'"'."\n"; This was sort of confusing, did you really want it to print just the first word of a subject? Mine won't do that, but it's easy enough to change. #! /usr/bin/perl open(mail, $ENV{'MAIL'}) || exit; while (<mail>) { # skip this block unless it's "From ", else print it /^From / || next; chop; print; while (<mail>) { chop; # for the other way, substitute this line: # ((length) ? s/^Subject: *([^: ]+)/print qq| "$1"\n|/e : print "\n") ((length) ? s/^Subject: */print qq| "$'"\n|/e : print "\n") && last; } } print "\n"; print "Just another Perl hacker, "; -- ______________________________________________________________________ \ robert earl / "Love is a many splintered thing" rearl@watnxt3.ucr.edu \ --Sisters of Mercy rearl@gnu.ai.mit.edu /
tchrist@convex.COM (Tom Christiansen) (05/21/91)
From the keyboard of subbarao@phoenix.Princeton.EDU (Kartik Subbarao): :Anyone know of a cooler way to do this? : :#! /usr/princeton/bin/perl : :open(mail, $ENV{'MAIL'}) || exit; : :while (<mail>) { : if (/^From /) { # Note this is different from From: : chop ($out = $_); print $out; : while (<mail>) { : if (/^$/) { : print "\n"; : last; : } : if (/^Subject:/) { : chop; : @subj = split(/[: ]/,$_); : print ' "'.$subj[2].'"'."\n"; : last; : } : } : } :} Cooler? Hmm... Well, here's you I added: default open to /usr/spool/mail/$USER unless $MAIL set accept command line arguments for from filters make everything line up prettier be case insensitive strip local domain from hostname don't stop on first subject line lest there be more removed unneeded split and chops reduced the line count by 4 for job security through obscurity :-) Here's the code: #!/usr/bin/perl open (MAIL, $ENV{'MAIL'} || '/usr/spool/mail/'.$ENV{'USER'}) || exit; $only = join('|', @ARGV); $domain = '\.convex\.com'; while (<MAIL>) { if (s/^from\s+//i) { # start of message @ARGV && (/$only/io || next); s/$domain//io if $domain; printf "%-18s %-25s", /(\S+)\s+(.*)/; while (<MAIL>) { /^$/ && (print("\n"), last); s/^subject:\s*(.*)\n$/"$1"/i && print; } } } --tom -- Tom Christiansen tchrist@convex.com convex!tchrist "So much mail, so little time."
gamin@ireq-robot.hydro.qc.ca (Martin Boyer) (05/22/91)
jv@mh.nl (Johan Vromans) writes: > From: subbarao@phoenix.Princeton.EDU (Kartik Subbarao) > Subject: perl version of from(1) > Date: 19 May 91 23:54:24 GMT > Organization: American Chemical Society > Anyone know of a cooler way to do this? Depends on what you call 'cool' :-) . [shar of from.pl 1.5 deleted] Here's my version modified to: handle multiple mailboxes (default is $HOME/INBOX, created by vm) reset message counter for each mailbox test for presence of mailbox instead of spool directory in BSD systems #!/usr/bin/perl #----Le laboratoire de robotique de l'Institut de recherche d'Hydro-Quebec----- # # Nom : from [-n] [MAILBOX] [MAILBOX...] # Fonction: Imprime un sommaire de la boite postale, # avec -n, imprime les numeros de message aussi. # Fichiers: from, /usr/spool/mail, ~/INBOX # Notes : Requiert perl version 3.0, patchlevel 4 ou mieux. # # Cree : 1989 ------------ Johan Vromans <jv@mh.nl> # Modifie : 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca> # Copyright 1989, 1990 Johan Vromans <jv@mh.nl>, no rights reserved. # # Historique: # # 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca> # Modifie la version 1.5 de Johan Vromans: ajoute INBOX # et remis le compteur de message a zero pour chaque fichier. #------------------------------------------------------------------------------ $inbox = "$ENV{'HOME'}/INBOX"; # Default output format format = @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $date, $from, $subj . # Output format when sequence numbers are requested format format_n = @>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $seq, $date, $from, $subj . # Parse and stash away -n switch, if provided if ($#ARGV >= 0 && $ARGV[0] eq '-n') { shift (@ARGV); $~ = "format_n"; } # Use system mailbox and user INBOX # if none was specified on the command line if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { # System V @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail/$user" ) { # BSD @ARGV = ("/usr/spool/mail/$user"); } if ( $inbox && -r $inbox ) { @ARGV = ($inbox, @ARGV); } if ($#ARGV < 0) { printf STDERR "No mail for $user.\n"; exit 1; } } if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail/$user" ) { # BSD @ARGV = ("/usr/spool/mail/$user"); } else { printf STDERR "No mail for $user.\n"; exit 1; } } # Read through input file(s) while (<>) { if (eof) {$seq = 0;} # Look for a "From_" header (See RFC822 and associated documents). next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; chop; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { print STDERR "Possible garbage: $_\n"; next; } $seq++; # Get user name from uucp path $from = $1 if $from =~ /.*!(.+)/; # Now, scan for Subject or empty line $subj = ""; while ( <> ) { chop ($_); if ( /^$/ || /^From / ) { # force fall-though $subj = "<none>" unless $subj; } else { $subj = $1 if /^Subject\s*:\s*(.*)/i; if ( /^From\s*:\s*/ ) { $_ = $'; if ( /\((.+)\)/i ) { $from = $1; } elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; } elsif ( /^<.+>\s*(.+)/i ) { $from = $1; } } } # do we have enough info? if ( $from && $subj ) { write; last; } } } -- Martin Boyer mboyer@ireq-robot.hydro.qc.ca Institut de recherche d'Hydro-Quebec mboyer@ireq-robot.uucp Varennes, QC, Canada J3X 1S1 +1 514 652-8412