jv@mh.nl (Johan Vromans) (12/22/89)
In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes: Here's a new version of from.sed, my sed script that does the job of from(1) better and faster. It now truncates long subjects, correctly handles messages without subjects and From lines with % or @foo: routing. Yes, I tried writing this in Perl. I'm not an expert Perl programmer, but I couldn't get it to run faster than about 70% slower than sed. I've been using a perl version of 'from' for a long time, so I trow it in. Features: - shortens the date, so there's more room for subject - shortens long subjects - uses "From: " headers if possible - provide "<none>" subject - automatic determination of system mailbox - maybe more - output sample: Nov 29 00:14 "jv " Re: your mail through the list got here Nov 28 21:21 "David Dyck " your mail through the list got here Nov 29 08:28 "Mark H. Colbu" Re: output compatibility It runs about as fast as the sed version. Typical times for a large mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl. ------ begin of from -- ascii -- complete ------ #!/usr/bin/perl # This program requires perl version 3.0, patchlevel 4 or higher # Usage "from MAILBOX..." # Don't forget: perl is a Practical Extract and Report Language! format = @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $date, $from, $subj . if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail" ) { @ARGV = ("/usr/spool/mail/$user"); } else { printf STDERR "No mail for $user.\n"; exit 1; } } # read through input file(s) while ( $line = <> ) { chop ($line); # scan until "From_" header found next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { print STDERR "Possible garbage: $line\n"; next; } # get user name from uucp path $from = $1 if $from =~ /.*!(.+)/; # now, scan for Subject or empty line $subj = ""; while ( $line = <> ) { chop ($line); if ( $line =~ /^$/ || $line =~ /^From / ) { # force fall-though $subj = "<none>" unless $subj; } else { $subj = $1 if $line =~ /^Subject\s*:\s*(.*)/i; if ( $line =~ /^From\s*:\s*/ ) { $line = $'; if ( $line =~ /\((.+)\)/i ) { $from = $1; } elsif ( $line =~ /^\s*(.+)\s*<.+>/i ) { $from = $1; } elsif ( $line =~ /^<.+>\s*(.+)/i ) { $from = $1; } } } # do we have enough info? if ( $from && $subj ) { write; last; } } } ------ end of from -- ascii -- complete ------ Have fun, Johan -- Johan Vromans jv@mh.nl via internet backbones Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500 ------------------------ "Arms are made for hugging" -------------------------
jv@mh.nl (Johan Vromans) (01/04/90)
Note: I have redirected follow-ups to comp.lang.perl. In article <JV.89Dec21221143@mhres.mh.nl> jv@mh.nl (Johan Vromans) writes: | In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes: | Here's a new version of from.sed, my sed script that does the job | of from(1) better and faster. It now truncates long subjects, | correctly handles messages without subjects and From lines with % | or @foo: routing. | | Yes, I tried writing this in Perl. I'm not an expert Perl programmer, | but I couldn't get it to run faster than about 70% slower than sed. To which I replied: | I've been using a perl version of 'from' for a long time, so I trow it | in. [...] | It runs about as fast as the sed version. Typical times for a large | mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl. Script fragment: while ( $line = <> ) { chop ($line); # scan until "From_" header found next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; I was pointed out by John J. Chew <poslfit@gpu.UTCS.UToronto.CA> that tightening the search for "From " would speed up the program by 30%. He suggested: while ( <> ) { next unless /^From /; chop ($line); next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; Well, I tried it, and -NOT to my surprise- I found out that the major speedup is caused by leaving out the assignment to the variable $line and postponing the chop. I couldn't imagine (knowing how Larry likes optimisation) that next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; would take more time to fail than next unless /^From /; With the speedups, the perl script beats the sed script on both large and small mailboxes: ~ > wc -lc INBOX 163 6927 INBOX ~ > dotime 5 perl src/perl.pl INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.2 0.4 0.2 0.2 0.2 0.2 user 0.0 0.0 0.0 0.0 0.0 0.0 sys 0.1 0.1 0.1 0.1 0.1 0.1 ~ > dotime 5 sed -f from.sed INBOX Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 0.5 0.7 0.4 0.5 0.4 0.4 user 0.1 0.1 0.1 0.1 0.1 0.1 sys 0.2 0.2 0.2 0.3 0.2 0.2 ~ > wc -lc maildir/pax 46585 1240000 maildir/pax ~ > dotime 5 perl src/from.pl maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 21.9 21.9 20.3 21.1 25.7 20.7 user 14.0 14.4 14.3 14.1 13.7 13.6 sys 5.9 5.8 4.9 5.7 7.4 5.9 ~ > dotime 5 sed -f from.sed maildir/pax Avg Pass 1 2 3 4 5 ----- ------- ----- ----- ----- ----- real 23.1 23.4 22.7 22.9 23.1 23.5 user 14.8 14.8 14.9 14.8 14.3 15.2 sys 7.4 7.4 7.1 7.3 7.8 7.2 I have posted the "dotime" program to alt.sources, for whoever thinks she/he can use it. Have fun! Johan -- Johan Vromans jv@mh.nl via internet backbones Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500 ------------------------ "Arms are made for hugging" -------------------------
subbarao@phoenix.Princeton.EDU (Kartik Subbarao) (05/20/91)
Anyone know of a cooler way to do this? --------- #! /usr/princeton/bin/perl open(mail, $ENV{'MAIL'}) || exit; while (<mail>) { if (/^From /) { # Note this is different from From: chop ($out = $_); print $out; while (<mail>) { if (/^$/) { print "\n"; last; } if (/^Subject:/) { chop; @subj = split(/[: ]/,$_); print ' "'.$subj[2].'"'."\n"; last; } } } } -- internet% ypwhich subbarao@phoenix.Princeton.EDU -| Internet kartik@silvertone.Princeton.EDU (NeXT mail) SUBBARAO@PUCC.BITNET - Bitnet
jv@mh.nl (Johan Vromans) (05/20/91)
> Anyone know of a cooler way to do this?
Depends on what you call 'cool' :-) .
#!/bin/sh
# This is a shell archive (produced by shar 3.49)
# To extract the files from this archive, save it to a file, remove
# everything above the "!/bin/sh" line above, and type "sh file_name".
#
# made 05/20/1991 10:09 UTC by jv@pronto
# Source directory /u1/users/jv
#
# existing files will NOT be overwritten unless -c is specified
#
# This shar contains:
# length mode name
# ------ ---------- ------------------------------------------
# 2214 -rwxr-xr-x from
#
# ============= from ==============
if test -f 'from' -a X"$1" != X"-c"; then
echo 'x - skipping from (File already exists)'
else
echo 'x - extracting from (Text)'
sed 's/^X//' << 'SHAR_EOF' > 'from' &&
X#!/usr/bin/perl
X
X# This program requires perl version 3.0, patchlevel 4 or higher
X
X# @($)@ from 1.5 - from.pl
X
X# Show messages from a Unix mailbox. With -n: shown message numbers also.
X#
X# Usage "from [-n] MAILBOX..."
X#
X# Don't forget: perl is a Practical Extract and Report Language!
X#
X# Copyright 1989,1990 Johan Vromans <jv@mh.nl>, no rights reserved.
X# Usage and redistribution is free and encouraged.
X
X# Default output format
Xformat =
X@<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$date, $from, $subj
X.
X
X# Output format when sequence numbers are requested
Xformat format_n =
X@>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<..
X$seq, $date, $from, $subj
X.
X
X# Parse and stash away -n switch, if provided
Xif ($#ARGV >= 0 && $ARGV[0] eq '-n') {
X shift (@ARGV);
X $~ = "format_n";
X}
X
X# Use system mailbox if none was specified on the command line
Xif ( $#ARGV < 0 ) {
X if ( ! ($user = getlogin)) {
X @a = getpwuid($<);
X $user = $a[0];
X }
X if ( -r "/usr/mail/$user" ) { # System V
X @ARGV = ("/usr/mail/$user");
X }
X elsif ( -r "/usr/spool/mail" ) { # BSD
X @ARGV = ("/usr/spool/mail/$user");
X }
X else {
X printf STDERR "No mail for $user.\n";
X exit 1;
X }
X}
X
X$seq = 0;
X# Read through input file(s)
Xwhile (<>) {
X
X # Look for a "From_" header (See RFC822 and associated documents).
X next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/;
X
X chop;
X $from = $1;
X $date = $2;
X if ( $date eq "" || $from eq "" ) {
X print STDERR "Possible garbage: $_\n";
X next;
X }
X
X $seq++;
X # Get user name from uucp path
X $from = $1 if $from =~ /.*!(.+)/;
X
X # Now, scan for Subject or empty line
X $subj = "";
X while ( <> ) {
X chop ($_);
X
X if ( /^$/ || /^From / ) {
X # force fall-though
X $subj = "<none>" unless $subj;
X }
X else {
X $subj = $1 if /^Subject\s*:\s*(.*)/i;
X if ( /^From\s*:\s*/ ) {
X $_ = $';
X if ( /\((.+)\)/i ) { $from = $1; }
X elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; }
X elsif ( /^<.+>\s*(.+)/i ) { $from = $1; }
X }
X }
X
X # do we have enough info?
X if ( $from && $subj ) {
X write;
X last;
X }
X }
X}
SHAR_EOF
chmod 0755 from ||
echo 'restore of from failed'
Wc_c="`wc -c < 'from'`"
test 2214 -eq "$Wc_c" ||
echo 'from: original size 2214, current size' "$Wc_c"
fi
exit 0
--
Johan Vromans jv@mh.nl via internet backbones
Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv
Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62911/62500
------------------------ "Arms are made for hugging" -------------------------
rearl@watnxt3.ucr.edu (Robert Earl) (05/20/91)
In article <azioL/5ej/g.c@idunno.Princeton.EDU> subbarao@phoenix.Princeton.EDU (Kartik Subbarao) writes: | Anyone know of a cooler way to do this? It's Perl, there's ALWAYS a cooler way. :-) | @subj = split(/[: ]/,$_); | print ' "'.$subj[2].'"'."\n"; This was sort of confusing, did you really want it to print just the first word of a subject? Mine won't do that, but it's easy enough to change. #! /usr/bin/perl open(mail, $ENV{'MAIL'}) || exit; while (<mail>) { # skip this block unless it's "From ", else print it /^From / || next; chop; print; while (<mail>) { chop; # for the other way, substitute this line: # ((length) ? s/^Subject: *([^: ]+)/print qq| "$1"\n|/e : print "\n") ((length) ? s/^Subject: */print qq| "$'"\n|/e : print "\n") && last; } } print "\n"; print "Just another Perl hacker, "; -- ______________________________________________________________________ \ robert earl / "Love is a many splintered thing" rearl@watnxt3.ucr.edu \ --Sisters of Mercy rearl@gnu.ai.mit.edu /
gamin@ireq-robot.hydro.qc.ca (Martin Boyer) (05/22/91)
jv@mh.nl (Johan Vromans) writes: > From: subbarao@phoenix.Princeton.EDU (Kartik Subbarao) > Subject: perl version of from(1) > Date: 19 May 91 23:54:24 GMT > Organization: American Chemical Society > Anyone know of a cooler way to do this? Depends on what you call 'cool' :-) . [shar of from.pl 1.5 deleted] Here's my version modified to: handle multiple mailboxes (default is $HOME/INBOX, created by vm) reset message counter for each mailbox test for presence of mailbox instead of spool directory in BSD systems #!/usr/bin/perl #----Le laboratoire de robotique de l'Institut de recherche d'Hydro-Quebec----- # # Nom : from [-n] [MAILBOX] [MAILBOX...] # Fonction: Imprime un sommaire de la boite postale, # avec -n, imprime les numeros de message aussi. # Fichiers: from, /usr/spool/mail, ~/INBOX # Notes : Requiert perl version 3.0, patchlevel 4 ou mieux. # # Cree : 1989 ------------ Johan Vromans <jv@mh.nl> # Modifie : 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca> # Copyright 1989, 1990 Johan Vromans <jv@mh.nl>, no rights reserved. # # Historique: # # 21 mai 91 ---------1- Martin Boyer <mboyer@ireq-robot.hydro.qc.ca> # Modifie la version 1.5 de Johan Vromans: ajoute INBOX # et remis le compteur de message a zero pour chaque fichier. #------------------------------------------------------------------------------ $inbox = "$ENV{'HOME'}/INBOX"; # Default output format format = @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $date, $from, $subj . # Output format when sequence numbers are requested format format_n = @>: @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $seq, $date, $from, $subj . # Parse and stash away -n switch, if provided if ($#ARGV >= 0 && $ARGV[0] eq '-n') { shift (@ARGV); $~ = "format_n"; } # Use system mailbox and user INBOX # if none was specified on the command line if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { # System V @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail/$user" ) { # BSD @ARGV = ("/usr/spool/mail/$user"); } if ( $inbox && -r $inbox ) { @ARGV = ($inbox, @ARGV); } if ($#ARGV < 0) { printf STDERR "No mail for $user.\n"; exit 1; } } if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail/$user" ) { # BSD @ARGV = ("/usr/spool/mail/$user"); } else { printf STDERR "No mail for $user.\n"; exit 1; } } # Read through input file(s) while (<>) { if (eof) {$seq = 0;} # Look for a "From_" header (See RFC822 and associated documents). next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; chop; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { print STDERR "Possible garbage: $_\n"; next; } $seq++; # Get user name from uucp path $from = $1 if $from =~ /.*!(.+)/; # Now, scan for Subject or empty line $subj = ""; while ( <> ) { chop ($_); if ( /^$/ || /^From / ) { # force fall-though $subj = "<none>" unless $subj; } else { $subj = $1 if /^Subject\s*:\s*(.*)/i; if ( /^From\s*:\s*/ ) { $_ = $'; if ( /\((.+)\)/i ) { $from = $1; } elsif ( /^\s*(.+)\s*<.+>/i ) { $from = $1; } elsif ( /^<.+>\s*(.+)/i ) { $from = $1; } } } # do we have enough info? if ( $from && $subj ) { write; last; } } } -- Martin Boyer mboyer@ireq-robot.hydro.qc.ca Institut de recherche d'Hydro-Quebec mboyer@ireq-robot.uucp Varennes, QC, Canada J3X 1S1 +1 514 652-8412