[comp.editors] perl script request

prastowo@qucis.queensu.CA (Bambang Nurcahyo Prastowo) (05/10/91)

Could any perl guru help me rewrite the following sh script in perl ?
The script is a mail sorter that cleans incoming mails (simplifies
headers and removes duplicated blank lines) and save them in mailboxes
named after the first words of email addresses appear in "From:...".

Bambang Prastowo 		prastowo@qucis.queensu.ca

#! /bin/sh
home=/grad/prastowo/MAIL
inmail=$home/inmail.$$
outmail=$home/outmail.$$
header=$home/header.$$
body=$home/body.$$
neat=$home/neat.$$
sender=$home/sender.$$

echo=/bin/echo
grep=/bin/grep
egrep=/bin/egrep
rm=/bin/rm
sed=/bin/sed
cat=/bin/cat
awk=/bin/awk
tr=/bin/tr

$cat > $inmail
$sed -n '1,/^$/p' $inmail > $header
$sed -n '/^$/,$p' $inmail > $body

trap "$rm -f $header $inmail $body $sender $neat" 0

(
$egrep '^From |^From:|^Date:|^Subject:' $header
$echo
$awk '/^Date:/||/^From:/||/^Subject:/ {printf(">")}{print}' $body
$echo _____________________________________
$echo
) | awk '/^[ \t]*$/{if(!b){print "";b=1}}/[^ \t]/{print $0;b=0}'> $neat

# keep mails in folders by the 1st words of senders' addresses.
sd=`$sed -n 's/^From://p' $header | $sed 's/.*<//;s/>.*//' |
       $awk '{print $1}' | sed 's/[@%\.\!].*$//' |
       $tr a-z A-Z `

$cat $neat >> $home/$sd
exit 0
-- 
Bambang Nurcahyo Prastowo              prastowo@qucis.queensu.ca
                                       prastowo@qucis.bitnet

lwall@jpl-devvax.jpl.nasa.gov (Larry Wall) (05/11/91)

In article <2083@qusunc.queensu.CA> prastowo@qucis.queensu.CA (Bambang Nurcahyo Prastowo) writes:
: Could any perl guru help me rewrite the following sh script in perl ?
: The script is a mail sorter that cleans incoming mails (simplifies
: headers and removes duplicated blank lines) and save them in mailboxes
: named after the first words of email addresses appear in "From:...".

Sure.  It'd go something like this:
------------------------------------------------------
#!/usr/bin/perl

$MAILDIR = "/grad/prastowo/MAIL";

$/ = '';			# paragraph mode
$* = 1;				# enable multiline matching for ^

# parse header into associative array

%hdr = ('FROM', split(/^([-\w]+):[ \t]*/, <>));

# open appropriate file

$sd = $hdr{'From'};
$sd =~ s/^.*<\s*(.*)>.*$/$1/;	# handle <> form
$sd =~ s/[\s@%.!].*//;		# delete trailing stuff
$sd =~ tr/a-z/A-Z/;		# canonicalize to upper case

open(STDOUT, ">>$MAILDIR/$sd") || die "Can't open $sd: $!\n";

# write the header (pick your order)

print		   $hdr{'FROM'}		if $hdr{'FROM'};
print 'From: ',    $hdr{'From'}		if $hdr{'From'};
print 'Date: ',    $hdr{'Date'}		if $hdr{'Date'};
print 'Subject: ', $hdr{'Subject'}	if $hdr{'Subject'};
print "\n";

# now do each remaining paragraph

while (<>) {
    $* = 0;
    s/^\n+//;			# delete extra blank lines
    $* = 1;
    s/^(Date|From|Subject):/>$1:/g;
    print;
}
print "_____________________________________\n\n";

------------------------------------------------------

I translated it fairly literally, including the fact that addresses like

	foo!bar@fiddle

save to file FOO rather than BAR.  I chose to process the message in
paragraph mode because it makes it easy to find extra newlines (they
show up at the front of the paragraph).  The only tricky think is that
you have to turn on and off whether ^ matches the beginning of each line
in the string or just the beginning of the string.

This could fairly easily be restructured to handle multiple messages
on the input stream by putting the header code into the final loop
and executing it whenever the current paragraph looks like a header.

Larry Wall
lwall@netlabs.com