[comp.lang.perl] Inconsistant fields -- distinguishing?

stealth@engin.umich.edu (Mike Pelletier) (06/04/91)

For my first trick...
I'm an utter perl novice, so tone your responses appropriately.  ;-)
I'm working on a mail log parser, and I'm a bit perplexed as to how
to deal with lines that have inconsistant field types.  For instance:

Jun  3 17:58:10 engin sendmail[12345]: AA01695: to=<spool.mu.edu!uunet!papaya.bbn.com!pineapple.bbn.com@engin.umich.edu>, delay=11:55:01, stat=Deferred

Jun  3 17:58:10 engin sendmail[12345]: SmtpIn/Out NULL for DATA

----
The first one has a queue id, AA01695, whereas the second one just has the
information right there, associated with the previous one only by the
bracketed process-id (which I *didn't* make up -- it actually is there in
the log file...)

Any ideas how to handle this in perl?  Does anyone have a mail log parser
that they could send to me for educational perusal?  Thanks for your help,
folks!

--
Mike Pelletier             | "Wind & waves are breakdowns in the commitment of
The University of Michigan |  getting from here to there, but they are the con-
  College of Engineering   |  ditions for sailing.  Not something to eliminate,
Student/Systems Admin      |  but something to dance with."

tchrist@convex.COM (Tom Christiansen) (06/04/91)

From the keyboard of stealth@engin.umich.edu (Mike Pelletier):
:For my first trick...
:I'm an utter perl novice, so tone your responses appropriately.  ;-)
:I'm working on a mail log parser, and I'm a bit perplexed as to how
:to deal with lines that have inconsistant field types.  For instance:
:
:Jun  3 17:58:10 engin sendmail[12345]: AA01695: to=<spool.mu.edu!uunet!papaya.bbn.com!pineapple.bbn.com@engin.umich.edu>, delay=11:55:01, stat=Deferred
:
:Jun  3 17:58:10 engin sendmail[12345]: SmtpIn/Out NULL for DATA
:
:----
:The first one has a queue id, AA01695, whereas the second one just has the
:information right there, associated with the previous one only by the
:bracketed process-id (which I *didn't* make up -- it actually is there in
:the log file...)
:
:Any ideas how to handle this in perl?  Does anyone have a mail log parser
:that they could send to me for educational perusal?  Thanks for your help,
:folks!

I've never seen sendmail log an entry without an AA\d+ string.  
To parse out something that's part variable fields and part not,
you could probably best use a regexp, like

    ($mon, $day, $time, $host, $program, $pid, $extra) = 
	/^(\w+)\s+(\d+)\s+(\d+:\d+:\d+)\s+(\w+)\s+(\w+)\s+(.*)/;

and then do with $extra as you like.   Maybe my \w's should be \S's.

I have a script for parsing sendmail syslog files.  It may well not do 
what you want.  It was one of my earlier perl script, so is a bit odd
looking (but aren't they all?).    I'll enclose it.

I keep meaning to write one that shows me all transactions to 
or from a particular user.  One of these days...

--tom

#!/usr/local/bin/perl
#
# ssl -- summarize sendmail syslog
#
#	original by Tom Christiansen <tchrist@convex.com>
#	probably in 88 or 89
#
#       further hacking by
#	Paul O'Neill
#	Coastal Imaging Lab
#	Oregon State University
#
#	18 jun 90
#	fix various bugs
#	add sorted output
#
# 	still more hacking by tchrist, sometime in 91
#
# summarize sendmail syslog
#
# flags mean:
#
#	-o	outbound mail only
#	-i	inbound  mail only
#	-t	suppress printing of totals
#	-e 	print strange lines to stderr
#	-m	reduce to local mbox is possible

($program = $0) =~ s%.*/%%;


while ($ARGV[0] =~ /^-/) {
    $ARGV[0] =~ s/^-//; 
    foreach $flag ( split (//,$ARGV[0]) ) {
	if ( 'oitem' !~ /$flag/ ) {
	    printf stderr "unknown flag: %s\n", $flag;
	    die "usage: $program [-oitem] [syslog_file ...]\n";
	} 
	die "$0: '$flag' flag already set\n" if ($flags{$flag}++);
    } 
    shift;
}

if ( !$flags{'o'} && !$flags{'i'} && !$flags{'t'}) {
    $flags{'o'}++;
    $flags{'i'}++;
} 

do hash_passwd() if $flags{'m'};

while (<>) {
    if (/: [A-Z][A-Z](\d+): from=(.*), size=(\d+)/) {
#	next unless $flags{'t'} || $flags{'o'};
	($id, $user, $size) = ($1, $2, $3);
#print "$user\n";
	$user =~ s/.*<(.*)>/$1/;		# get rid of <>
#print "$user\n";
	$user =~ tr/A-Z/a-z/;			# canonical lc
#print "$user\n";

	if ($flags{'m'}) {
	    $ouser = $user;
#print "	$user\n";
	    $user = do strip($user);
#print "	 $user\n";
	    $user = $ouser if ! $known{$user};
#print "	  $user\n";
#print "	  $known{$user}\n";
	}

	$from_user_size{$user} += $size;
	$id_size{$id} = $size;
	$from_user_count{$user}++;
	$total_from++;
    } elsif (/: [A-Z][A-Z](\d+): to=(.*), delay=/) {
#	next unless $flags{'t'} || $flags{'i'};
	$id = $1;
	for (split(/,/, $2)) {
	    s/.*<(.*)>/$1/;
#	    $to = $flags{'m'} ? do strip($_) : $_;
	    $to = $_;
	    if ($flags{'m'}) {
		$oto = $to;
#print "		$to\n";
		$to = do strip($to);
#print "		 $to\n";
		$to = $oto if ! $known{$to};
#print "		  $to\n";
#print "		  $known{$to}\n";
	    }
	    $to =~ tr/A-Z/a-z/;
#	    printf "adding %d bytes to %s from %s\n",
		$id_size{$id},$to,$user; 
	    if (!$to) {
		die "to no one: $_\n";

	    } 
	    $to_user_size{$to} += $id_size{$id};
	    $to_user_count{$to}++;
	    $total_to++;
	} 
    } else {
	study;
	next if /message-id/;
	next if /locked/;
	next if /alias database (auto|)rebuilt/;
	#next if /aliases/;
	next if /rebuilding alias database/;
	print stderr if $flags{'e'};
	$errors++;
    } 
} 

printf stderr "Error lines: %d\n", $errors if $errors && ($flags{'e'}) && !($flags{'t'});


if ($flags{'i'}) {
    printf "To: %d\n", $total_to unless $flags{'t'};;

    @loop = keys(to_user_size);
    foreach $user (sort tosort @loop) {
	printf "%4d message%s %7d bytes %s\n",
	    $to_user_count{$user}, 
	    $to_user_count{$user} != 1 ? "s" : " ",
	    $to_user_size{$user}, 
	    $user;
    } 
}


if ($flags{'o'}) {
    printf "From: %d\n", $total_from unless $flags{'t'};;

    @loop = keys(from_user_size);
    foreach $user (sort fromsort @loop) {
	printf "%4d message%s %7d bytes %s\n",
	    $from_user_count{$user}, 
	    $from_user_count{$user} != 1 ? "s" : " ",
	    $from_user_size{$user}, 
	    $user;
    } 
}

sub tosort {
    ($to_user_count{$b} - $to_user_count{$a})* 10000000 + $to_user_size{$b} - $to_user_size{$a};
}

sub fromsort {
    ($from_user_count{$b} - $from_user_count{$a}) * 10000000 + $from_user_size{$b} -$from_user_size{$a};
}

sub strip {
    local($foo) = shift(@_);
#print "$foo\n";
    
    $foo =~ s/@.*//;
    $foo =~ s/.*!//;
    $foo =~ s/\s*\(.*\)//;
    $foo =~ tr/A-Z/a-z/;

    return $foo;
} 

sub hash_passwd {
    chop($yp = `/bin/domainname`) if -x '/bin/domainname';
    $passwd = $yp ? 'ypcat passwd |' : '/etc/passwd';
    open passwd || die "$program: can't open $passwd: $!\n";
    while (<passwd>) {
	/^(\w+):[^:]+:(\d+):.*/;
	($who,$uid) = ($1, $2);
#	$uid = 'zero' if ! $uid;  # kludge for uid 0
	$uid = 'zero' if ($uid == 0) && $who;
#	$uid = 'zero' if defined($uid);
	$known{$who} = $uid;
#print "$who $uid		$known{$who}\n";
    } 
    close passwd;
#print "SPECIEALLLL -- $known{''}\n";
} 
--
Tom Christiansen		tchrist@convex.com	convex!tchrist
	    "Perl is to sed as C is to assembly language."  -me