jv@mh.nl (Johan Vromans) (12/22/89)
In article <1989Dec20.222732.5633@trigraph.uucp> john@trigraph.uucp (John Chew) writes: Here's a new version of from.sed, my sed script that does the job of from(1) better and faster. It now truncates long subjects, correctly handles messages without subjects and From lines with % or @foo: routing. Yes, I tried writing this in Perl. I'm not an expert Perl programmer, but I couldn't get it to run faster than about 70% slower than sed. I've been using a perl version of 'from' for a long time, so I trow it in. Features: - shortens the date, so there's more room for subject - shortens long subjects - uses "From: " headers if possible - provide "<none>" subject - automatic determination of system mailbox - maybe more - output sample: Nov 29 00:14 "jv " Re: your mail through the list got here Nov 28 21:21 "David Dyck " your mail through the list got here Nov 29 08:28 "Mark H. Colbu" Re: output compatibility It runs about as fast as the sed version. Typical times for a large mailbox (46585 lines) real/user/sys 50/16/8 for sed, 50/22/7 for perl. ------ begin of from -- ascii -- complete ------ #!/usr/bin/perl # This program requires perl version 3.0, patchlevel 4 or higher # Usage "from MAILBOX..." # Don't forget: perl is a Practical Extract and Report Language! format = @<<<<<<<<<<< "@<<<<<<<<<<<<" ^<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<.. $date, $from, $subj . if ( $#ARGV < 0 ) { if ( ! ($user = getlogin)) { @a = getpwuid($<); $user = $a[0]; } if ( -r "/usr/mail/$user" ) { @ARGV = ("/usr/mail/$user"); } elsif ( -r "/usr/spool/mail" ) { @ARGV = ("/usr/spool/mail/$user"); } else { printf STDERR "No mail for $user.\n"; exit 1; } } # read through input file(s) while ( $line = <> ) { chop ($line); # scan until "From_" header found next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { print STDERR "Possible garbage: $line\n"; next; } # get user name from uucp path $from = $1 if $from =~ /.*!(.+)/; # now, scan for Subject or empty line $subj = ""; while ( $line = <> ) { chop ($line); if ( $line =~ /^$/ || $line =~ /^From / ) { # force fall-though $subj = "<none>" unless $subj; } else { $subj = $1 if $line =~ /^Subject\s*:\s*(.*)/i; if ( $line =~ /^From\s*:\s*/ ) { $line = $'; if ( $line =~ /\((.+)\)/i ) { $from = $1; } elsif ( $line =~ /^\s*(.+)\s*<.+>/i ) { $from = $1; } elsif ( $line =~ /^<.+>\s*(.+)/i ) { $from = $1; } } } # do we have enough info? if ( $from && $subj ) { write; last; } } } ------ end of from -- ascii -- complete ------ Have fun, Johan -- Johan Vromans jv@mh.nl via internet backbones Multihouse Automatisering bv uucp: ..!{uunet,hp4nl}!mh.nl!jv Doesburgweg 7, 2803 PL Gouda, The Netherlands phone/fax: +31 1820 62944/62500 ------------------------ "Arms are made for hugging" -------------------------
john@trigraph.uucp (John Chew) (12/30/89)
In article <1989Dec20.222732.5633@trigraph.uucp> I posted a sed script that does the job of from(1). In <JV.89Dec21221143@mhres.mh.nl> Johan Vromans <jv@mh.nl> posted a perl script that does the same thing. I've tried both out on various mailboxes and have come to the following conclusions: 1. On small mailboxes, the compilation-time overhead of perl makes it a pig. 2. On large mailboxes, especially those containing long messages, perl can catch up to sed. 3. The following patch to Johan Vromans' perl script speeds it up by as much as 30% on large files, by tightening the search-for-From_ loop. *** old/from.jv.pl Fri Dec 29 12:05:15 1989 --- from.jv.pl Fri Dec 29 11:42:13 1989 *************** *** 30,40 # read through input file(s) ! while ( $line = <> ) { ! chop ($line); ! ! # scan until "From_" header found ! next unless $line =~ /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { --- 30,39 ----- # read through input file(s) ! while (<>) { ! next unless /^From /; ! chop; ! next unless /^From\s+(\S+)\s+.*(\w{3}\s+\d+\s+\d+:\d+)/; $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { *************** *** 38,44 $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { ! print STDERR "Possible garbage: $line\n"; next; } --- 37,43 ----- $from = $1; $date = $2; if ( $date eq "" || $from eq "" ) { ! print STDERR "Possible garbage: $_\n"; next; } I'll keep both scripts around for now. I actually prefer the notion of writing such things in perl, but when your mail machine is a heavily- used VAX-11/750 you can't afford luxuries.... John -- john j. chew, iii phone: +1 416 425 3818 AppleLink: CDA0329 trigraph, inc., toronto, canada {uunet!utai!utcsri,utgpu,utzoo}!trigraph!john dept. of math., u. of toronto poslfit@{utorgpu.bitnet,gpu.utcs.utoronto.ca}