sherk@nmc.cit.cornell.edu (Erik Sherk) (03/07/91)
Hello,
I am using a program called statspy to gather statistics on
net to net traffic across our external ethernet. Statspy keeps a
frequency distribution of net to net traffic with a count of all
packets between those nets, a percentage, and a time delta from the
current time of when it last saw a packet with that source and
destination network. It keeps these stats in memory. To retrieve them,
you use another program('collect') which opens a TCP connection to
statspy and dumps all the stats. The output of collect looks like
this:
------------------------------------------------------------
Log created on Mon Mar 4 14:42:59 1991, for host dmz.
Object name = 'net.matrix'.
OBJECT: net.matrix Class= matrix-sym [Created: 14:36:05 03-04-91]
ReadTime: 14:48:39 03-04-91, ClearTime: 14:36:05 03-04-91 (@-754sec)
Total Count= 459437 (+0 orphans)
#bins = 3011
[192.33.4.0 : 192.35.82.0]= 32241 ( 7.0%) @-0sec
[128.145.0.0 : 128.253.0.0]= 11621 ( 2.5%) @-8sec
[128.84.0.0 : 192.5.107.0]= 10332 ( 2.2%) @-0sec
[36.0.0.0 : 128.253.0.0]= 9465 ( 2.1%) @-0sec
[128.59.0.0 : 128.252.0.0]= 8726 ( 1.9%) @-147sec
.
.
. ( *lots* of output deleted)
.
.
[128.228.0.0 : 134.129.0.0]= 6213 ( 1.4%) @-2sec
[128.118.0.0 : 128.228.0.0]= 5716 ( 1.2%) @-0sec
[128.42.0.0 : 136.161.0.0]= 5430 ( 1.2%) @-290sec
[36.0.0.0 : 128.205.0.0]= 4708 ( 1.0%) @-0sec
------------------------------------------------------------
This output is rather long. If I try to use collect to check
point this data, I will quickly run out of disk space. So I wrote a
perl script to take the output from collect and coalesce the check
pointed data into a "monthly total so far" file. My problem (finnaly)
is that the script fails intermitently with a "Out of memory!" error.
It does this on test data files that are about 2% of what I think they
will grow to. (I expect the matrix to have about 200000 entries
eventually). I have set my stack limit under csh to 50MB and the
problem has seemed to go away, but I worry what will happen I start
using real data.
This is on a Sun Sparc 1 running 4.1. The version of perl is
$Header: perly.c,v 3.0.1.5 90/03/27 16:20:57 lwall Locked $
Patch level: 18
Can anyone tell me 1) is perl the right tool for the job?
2) is there a more memory efficient way to use an
accociative array?
3) a better way?
Here is the script: (it isn't finished yet. :-)
----------------------------------------------------------------------
#!/usr/local/bin/perl
#
# script to coaless hourly snap shots of statspy data.
#
# 1) read the header of the new file and the header of the total
# for the day. if
# print error if invoked w/o file names
if ($#ARGV != 1){
print "Usage: matrix.p monthly-file new-hourly-file\n";
print "Will coalless the new-hourly-file into the monthly-file.\n";
exit;
}
open(MONTH,$ARGV[0]) || die "$0: couldn't open $ARGV[0]; $!\n";
open(HOUR,$ARGV[1]) || die "$0: couldn't open $ARGV[1]; $!\n";
open(OUT,">tmp.file") || die "$0: couldn't open tmp.file; $!\n";
# find the clear time of these varables
for ($i=0; $i < 9; $i++){
$header = <MONTH>;
if ($header =~ /ClearTime/) {
($foo,$str) = split(/\@/,$header);
($month_cleartime,$foo) = split(/s/,$str);
print $month_cleartime;
}
}
for ($i=0; $i < 9; $i++){
$header = <HOUR>;
if ($header =~ /ClearTime/) {
($foo,$str) = split(/\@/,$header);
($hour_cleartime,$foo) = split(/s/,$str);
print $hour_cleartime;
}
}
# cleartimes will be negative.
if ($hour_cleartime > $month_cleartime ){ # i.e. a smaller neg. number
print "Reset \n";
$reset = 1;
} else {
print "Not Reset\n";
$reset = 0;
}
#creat associative array of all net to nets we have seen so far this month
while (<MONTH>){
($entry, $value) = split(/=/);
($count, @foo) = split(' ', $value);
$matrix{ $entry } = $count;
print $matrix{ $entry };
print "\n";
}
while (<HOUR>){
($entry, $value) = split(/=/);
($count, @foo) = split(' ', $value);
#
# Three cases 1) new net-to-net entry 2) additional traffic for old
# entry, replace old value 3) reset has happened, add old val to new.
#
if ( $matrix{ $entry }) {
if ( $reset ) { # add old to new
$matrix{ $entry } += $count;
} else {
if ( $matrix{ $entry } > $count ){ # this cant happen
print "Error: cur_val > new_val w/o reset\n";
exit;
} else { # replace old val
$matrix{ $entry } = $count;
}
}
} else {
print "New bin\n";
$matrix{ $entry } = $count;
}
print $matrix{ $entry };
print "\n";
}
for (%matrix){
print $_;
}
Erik Sherk
sherk@nmc.cit.cornell.edu
--
Erik Sherk sherk@nmc.cit.cornell.edu (607) 255-1679
Network Specialist "You can't swing a dead cat at Cornell
Network Managment Center without hitting a manager!"
Network Support Services
Cornell Information Technologies
Cornell University