sherk@nmc.cit.cornell.edu (Erik Sherk) (03/07/91)
Hello, I am using a program called statspy to gather statistics on net to net traffic across our external ethernet. Statspy keeps a frequency distribution of net to net traffic with a count of all packets between those nets, a percentage, and a time delta from the current time of when it last saw a packet with that source and destination network. It keeps these stats in memory. To retrieve them, you use another program('collect') which opens a TCP connection to statspy and dumps all the stats. The output of collect looks like this: ------------------------------------------------------------ Log created on Mon Mar 4 14:42:59 1991, for host dmz. Object name = 'net.matrix'. OBJECT: net.matrix Class= matrix-sym [Created: 14:36:05 03-04-91] ReadTime: 14:48:39 03-04-91, ClearTime: 14:36:05 03-04-91 (@-754sec) Total Count= 459437 (+0 orphans) #bins = 3011 [192.33.4.0 : 192.35.82.0]= 32241 ( 7.0%) @-0sec [128.145.0.0 : 128.253.0.0]= 11621 ( 2.5%) @-8sec [128.84.0.0 : 192.5.107.0]= 10332 ( 2.2%) @-0sec [36.0.0.0 : 128.253.0.0]= 9465 ( 2.1%) @-0sec [128.59.0.0 : 128.252.0.0]= 8726 ( 1.9%) @-147sec . . . ( *lots* of output deleted) . . [128.228.0.0 : 134.129.0.0]= 6213 ( 1.4%) @-2sec [128.118.0.0 : 128.228.0.0]= 5716 ( 1.2%) @-0sec [128.42.0.0 : 136.161.0.0]= 5430 ( 1.2%) @-290sec [36.0.0.0 : 128.205.0.0]= 4708 ( 1.0%) @-0sec ------------------------------------------------------------ This output is rather long. If I try to use collect to check point this data, I will quickly run out of disk space. So I wrote a perl script to take the output from collect and coalesce the check pointed data into a "monthly total so far" file. My problem (finnaly) is that the script fails intermitently with a "Out of memory!" error. It does this on test data files that are about 2% of what I think they will grow to. (I expect the matrix to have about 200000 entries eventually). I have set my stack limit under csh to 50MB and the problem has seemed to go away, but I worry what will happen I start using real data. This is on a Sun Sparc 1 running 4.1. The version of perl is $Header: perly.c,v 3.0.1.5 90/03/27 16:20:57 lwall Locked $ Patch level: 18 Can anyone tell me 1) is perl the right tool for the job? 2) is there a more memory efficient way to use an accociative array? 3) a better way? Here is the script: (it isn't finished yet. :-) ---------------------------------------------------------------------- #!/usr/local/bin/perl # # script to coaless hourly snap shots of statspy data. # # 1) read the header of the new file and the header of the total # for the day. if # print error if invoked w/o file names if ($#ARGV != 1){ print "Usage: matrix.p monthly-file new-hourly-file\n"; print "Will coalless the new-hourly-file into the monthly-file.\n"; exit; } open(MONTH,$ARGV[0]) || die "$0: couldn't open $ARGV[0]; $!\n"; open(HOUR,$ARGV[1]) || die "$0: couldn't open $ARGV[1]; $!\n"; open(OUT,">tmp.file") || die "$0: couldn't open tmp.file; $!\n"; # find the clear time of these varables for ($i=0; $i < 9; $i++){ $header = <MONTH>; if ($header =~ /ClearTime/) { ($foo,$str) = split(/\@/,$header); ($month_cleartime,$foo) = split(/s/,$str); print $month_cleartime; } } for ($i=0; $i < 9; $i++){ $header = <HOUR>; if ($header =~ /ClearTime/) { ($foo,$str) = split(/\@/,$header); ($hour_cleartime,$foo) = split(/s/,$str); print $hour_cleartime; } } # cleartimes will be negative. if ($hour_cleartime > $month_cleartime ){ # i.e. a smaller neg. number print "Reset \n"; $reset = 1; } else { print "Not Reset\n"; $reset = 0; } #creat associative array of all net to nets we have seen so far this month while (<MONTH>){ ($entry, $value) = split(/=/); ($count, @foo) = split(' ', $value); $matrix{ $entry } = $count; print $matrix{ $entry }; print "\n"; } while (<HOUR>){ ($entry, $value) = split(/=/); ($count, @foo) = split(' ', $value); # # Three cases 1) new net-to-net entry 2) additional traffic for old # entry, replace old value 3) reset has happened, add old val to new. # if ( $matrix{ $entry }) { if ( $reset ) { # add old to new $matrix{ $entry } += $count; } else { if ( $matrix{ $entry } > $count ){ # this cant happen print "Error: cur_val > new_val w/o reset\n"; exit; } else { # replace old val $matrix{ $entry } = $count; } } } else { print "New bin\n"; $matrix{ $entry } = $count; } print $matrix{ $entry }; print "\n"; } for (%matrix){ print $_; } Erik Sherk sherk@nmc.cit.cornell.edu -- Erik Sherk sherk@nmc.cit.cornell.edu (607) 255-1679 Network Specialist "You can't swing a dead cat at Cornell Network Managment Center without hitting a manager!" Network Support Services Cornell Information Technologies Cornell University