[comp.lang.perl] Memory usage

sherk@nmc.cit.cornell.edu (Erik Sherk) (03/07/91)

Hello,
	I am using a program called statspy to gather statistics on
net to net traffic across our external ethernet. Statspy keeps a
frequency distribution of net to net traffic with a count of all
packets between those nets, a percentage, and a time delta from the
current time of when it last saw a packet with that source and
destination network. It keeps these stats in memory. To retrieve them,
you use another program('collect') which opens a TCP connection to
statspy and dumps all the stats. The output of collect looks like
this:

------------------------------------------------------------
Log created on Mon Mar  4 14:42:59 1991, for host dmz.
    Object name = 'net.matrix'.


OBJECT: net.matrix  Class= matrix-sym  [Created: 14:36:05 03-04-91]
  ReadTime: 14:48:39 03-04-91, ClearTime: 14:36:05 03-04-91 (@-754sec)
  Total Count= 459437 (+0 orphans)
  #bins = 3011
[192.33.4.0 : 192.35.82.0]= 32241  ( 7.0%) @-0sec
[128.145.0.0 : 128.253.0.0]= 11621 ( 2.5%) @-8sec
[128.84.0.0 : 192.5.107.0]= 10332  ( 2.2%) @-0sec
[36.0.0.0 : 128.253.0.0]= 9465     ( 2.1%) @-0sec
[128.59.0.0 : 128.252.0.0]= 8726   ( 1.9%) @-147sec
	.
	.
	.	( *lots* of output deleted)
	.
	.
[128.228.0.0 : 134.129.0.0]= 6213  ( 1.4%) @-2sec
[128.118.0.0 : 128.228.0.0]= 5716  ( 1.2%) @-0sec
[128.42.0.0 : 136.161.0.0]= 5430   ( 1.2%) @-290sec
[36.0.0.0 : 128.205.0.0]= 4708     ( 1.0%) @-0sec
------------------------------------------------------------	

	This output is rather long. If I try to use collect to check
point this data, I will quickly run out of disk space. So I wrote a
perl script to take the output from collect and coalesce the check
pointed data into a "monthly total so far" file. My problem (finnaly)
is that the script fails intermitently with a "Out of memory!" error.
It does this on test data files that are about 2% of what I think they
will grow to. (I expect the matrix to have about 200000 entries
eventually). I have set my stack limit under csh to 50MB and the
problem has seemed to go away, but I worry what will happen I start
using real data.

	This is on a Sun Sparc 1 running 4.1. The version of perl is
$Header: perly.c,v 3.0.1.5 90/03/27 16:20:57 lwall Locked $
Patch level: 18

Can anyone tell me 1) is perl the right tool for the job?
		2) is there a more memory efficient way to use an
accociative array?
		3) a better way?

Here is the script: (it isn't finished yet. :-)
----------------------------------------------------------------------
#!/usr/local/bin/perl
#
#	script to coaless hourly snap shots of statspy data.
#
#	1) read the header of the new file and the header of the total 
#	for the day. if 

# print error if invoked w/o file names
if ($#ARGV != 1){
	print "Usage: matrix.p monthly-file new-hourly-file\n";
	print "Will coalless the new-hourly-file into the monthly-file.\n";
	exit;
}

open(MONTH,$ARGV[0]) || die "$0: couldn't open $ARGV[0]; $!\n";
open(HOUR,$ARGV[1]) || die "$0: couldn't open $ARGV[1]; $!\n";
open(OUT,">tmp.file") || die "$0: couldn't open tmp.file; $!\n";

# find the clear time of these varables
for ($i=0; $i < 9; $i++){
	$header = <MONTH>;
	if ($header =~ /ClearTime/) {
		($foo,$str) = split(/\@/,$header);
		($month_cleartime,$foo) = split(/s/,$str);		
		print $month_cleartime;
	}
}
for ($i=0; $i < 9; $i++){
	$header = <HOUR>;
	if ($header =~ /ClearTime/) {
		($foo,$str) = split(/\@/,$header);
		($hour_cleartime,$foo) = split(/s/,$str);		
		print $hour_cleartime;
	}
}
# cleartimes will be negative.
	if ($hour_cleartime > $month_cleartime ){ # i.e. a smaller neg. number
		print "Reset \n";
		$reset = 1;
	} else {
		print "Not Reset\n";
		$reset = 0;
	}

#creat associative array of all net to nets we have seen so far this month
while (<MONTH>){
	($entry, $value) = split(/=/);
	($count, @foo) = split(' ', $value);
	$matrix{ $entry } = $count;
print $matrix{ $entry };
print "\n";
}
while (<HOUR>){
	($entry, $value) = split(/=/);
	($count, @foo) = split(' ', $value);
#
#	Three cases 1) new net-to-net entry 2) additional traffic for old
#	entry, replace old value 3) reset has happened, add old val to new.
#
	if ( $matrix{ $entry }) {
		if ( $reset ) { # add old to new
			$matrix{ $entry } += $count;
		} else {
			if ( $matrix{ $entry } > $count ){ # this cant happen
				print "Error: cur_val > new_val w/o reset\n";
				exit;
			} else { # replace old val
				$matrix{ $entry } = $count;
			}
		}
	} else {
		print "New bin\n";
		$matrix{ $entry } = $count;
	}
print $matrix{ $entry };
print "\n";
}

for (%matrix){
	print $_;
}




Erik Sherk
sherk@nmc.cit.cornell.edu



--
Erik Sherk                                      sherk@nmc.cit.cornell.edu (607) 255-1679 
Network Specialist                              "You can't swing a dead cat at Cornell
Network Managment Center                         without hitting a manager!"
Network Support Services
Cornell Information Technologies
Cornell University